In machine learning, feature transformation is a common step in the data preprocessing pipeline. ColumnTransformer is a useful tool in scikit-learn library that allows applying different transformations to different features of the input data. However, one problem that users face when using ColumnTransformer is how to extract the output feature names after the transformation. Specifically, how to obtain the column names of the transformed features when using ColumnTransformer in a pipeline. I am seeking help to find a way to extract the output feature names after applying the transformations through the ColumnTransformer. Any suggestions or alternative ways to achieve this goal are appreciated.
You can use the get_feature_names_out
method to get the column names of the transformed features by calling this method on your ColumnTransformer
object after the transformation. Here is an example code that transforms a sample dataframe
and gets the transformed column names:
- The feature names are prefixed with the names of the input columns and the names of the transformer steps.
- In this example, the input columns are
age
,income
,gender
, andeducation
, and the transformer steps arenum
andcat
. - The transformed data is also printed, which shows the standardized
age
andincome
columns, followed by the one-hot encodedgender
andeducation
columns.