A transformer
is a class in Scikit-learn
that takes input data and returns transformed data through the use of the fit
and transform
methods. By converting functions into transformers, users can easily integrate custom data preprocessing or feature engineering steps into their machine learning pipelines. In this thread, we will cover several approaches to transforming functions into transformers using Scikit-learn
.
1. Using "FunctionTransformer" :
One simple way to convert a function into a transformer is to use FunctionTransformer
from the sklearn.preprocessing
module. It can be used to apply a user-defined function to each element of a dataset.
Here’s an example of how to use it to transform the iris dataset’s features using the numpy.log
function:
In the above example, we import the load_iris
dataset from Scikit-learn and FunctionTransformer
class from Scikit-learn’s preprocessing
module.
A new instance of the FunctionTransformer
class is created using the np.log
function.Then, the transform
method of the FunctionTransformer
instance is used to apply the log transformation to the iris.data
array, resulting in a new array called iris_log
.
2. Using "Pipeline" with custom function:
You can convert a function into a transformer is to create a custom function and use it within a `sklearn` pipeline.Here’s an example of how to use a custom function to standardize the iris dataset’s features:
In the above example, we load the load_iris
dataset and defines a function that standardizes the input data using Scikit-learn’s StandardScaler
.Then, a pipeline is created with a single step that applies the standardize
function to the input data. The pipeline is applied to the input data using the transform
method, which standardizes the data and stores the result in a new variable called iris_standardized
.
3. Using "TransformerMixin":
If you want more control over the transformation process, you can create a custom transformer by inheriting from `sklearn` 's `TransformerMixin` class.Here’s an example of how to create a custom transformer that multiplies the iris dataset’s features by 2:
In above example, we create a custom transformer class called MultiplyTransformer
that multiplies the input data by 2. The class inherits from the TransformerMixin
class in Scikit-learn
and implements the transform
method to perform the multiplication. An instance of the MultiplyTransformer
class is created and used to transform the iris.data
array by calling the transform
method. The transformed data is stored in a new variable called iris_multiplied
.
4. "FunctionTransformer" with "lambda function" :
If your transformation function is short and simple, you can use a lambda function with FunctionTransformer
to transform the iris dataset.
Here’s an example of how to use a lambda function to convert the iris dataset’s features to their absolute values:
In the above example, we use FunctionTransformer
class to create a transformer object called abs_transformer
that applies the absolute value function to each element of the input data. The transform
method of the abs_transformer
object is then used to apply this transformation to the iris.data
array, resulting in a new array called iris_abs
.