How to get the feature names output by a ColumnTransformer?

nimrah · March 24, 2023, 4:49pm

In machine learning, feature transformation is a common step in the data preprocessing pipeline. ColumnTransformer is a useful tool in scikit-learn library that allows applying different transformations to different features of the input data. However, one problem that users face when using ColumnTransformer is how to extract the output feature names after the transformation. Specifically, how to obtain the column names of the transformed features when using ColumnTransformer in a pipeline. I am seeking help to find a way to extract the output feature names after applying the transformations through the ColumnTransformer. Any suggestions or alternative ways to achieve this goal are appreciated.

nimrah · April 13, 2023, 5:11pm

nimrah · April 26, 2023, 5:32pm

mubashir_rizvi · May 3, 2023, 6:15pm

You can use the get_feature_names_out method to get the column names of the transformed features by calling this method on your ColumnTransformer object after the transformation. Here is an example code that transforms a sample dataframe and gets the transformed column names:

The feature names are prefixed with the names of the input columns and the names of the transformer steps.
In this example, the input columns are age, income, gender, and education, and the transformer steps are num and cat.
The transformed data is also printed, which shows the standardized age and income columns, followed by the one-hot encoded gender and education columns.