In machine learning, feature transformation is a common step in the data preprocessing pipeline. ColumnTransformer is a useful tool in scikit-learn library that allows applying different transformations to different features of the input data. However, one problem that users face when using ColumnTransformer is how to extract the output feature names after the transformation. Specifically, how to obtain the column names of the transformed features when using ColumnTransformer in a pipeline. I am seeking help to find a way to extract the output feature names after applying the transformations through the ColumnTransformer. Any suggestions or alternative ways to achieve this goal are appreciated.
You can use the
get_feature_names_out method to get the column names of the transformed features by calling this method on your
ColumnTransformer object after the transformation. Here is an example code that transforms a sample
dataframe and gets the transformed column names:
- The feature names are prefixed with the names of the input columns and the names of the transformer steps.
- In this example, the input columns are
education, and the transformer steps are
- The transformed data is also printed, which shows the standardized
incomecolumns, followed by the one-hot encoded