Saving model or pipeline

Saving a trained model or pipeline in Scikit-learn is a crucial step in machine learning projects. It offers several advantages such as saving time and computational resources by avoiding retraining. It facilitates sharing the model with others and deploying it in production environments. It is helpful in automating data processing, creating predictive models, and building intelligent systems. In this thread, we will discuss different methods that help us in saving machine learning models or pipelines.

Creating a "model" or "pipeline":

We import some necessary libraries and generates some sample data and creates a pipeline that performs scaling and linear regression on the data. The pipeline is then fitted to the sample data. Once the pipeline is fitted, it can be used to make predictions on new, unseen data using the predict() function.

Let’s see the code below to gain better understanding:

Now, let’s discuss some methods of saving a model.

1. Using "pickle" module:

Pickle is a Python module used for serializing and de-serializing Python objects.

  • It can be used to save trained machine learning pipelines for later use without having to retrain the model.

  • It saves time and resources by allowing you to reuse the trained model without retraining it and allows for easy sharing and deployment of trained models.

Let’s see the example below to learn how pickle help us in saving the model.

In the above code, a pipeline object is saved to a file named 'model.pkl' using the `pickle.dump()` method. To load the saved pipeline object from the file, the `pickle.load()` method is used, and a new data point is used to predict a target value using the loaded pipeline. The predicted value is printed using the `print()` function.

2. Using "Joblib" module:

Joblib is a Python module that can save and load machine learning models and pipelines.

  • It is a part of the Scikit-learn library.

  • Compared to Pickle, Joblib can efficiently handle large NumPy arrays and reduce disk space usage by compressing the saved file.

Let’s see the example below to learn how Joblib help us in saving the model.

In the above code, a pipeline object is saved to a file named ‘model.joblib’ using the joblib.dump() method. To load the saved pipeline object from the file, the joblib.load() method is used, and a new data point is used to predict a target value using the loaded pipeline. The predicted value is printed using the print() function.

3. Using "yaml" module:

YAML is a human-readable data serialization format that can be used to save machine learning pipelines in a readable and portable format.

  • YAML files are platform-independent, making them ideal for sharing across different operating systems and programming languages.

Let’s see the example below to learn how YAML help us in saving the model.

In the above code, a pipeline object is saved to a file named ‘model.yaml’ using the yaml.dump() method. To load the saved pipeline object from the file, the yaml.load() method is used along with the BaseLoader class from the yaml.loader module. A new data point is used to predict a target value using the loaded pipeline, and the predicted value is printed using the print() function