How can I determine the optimality of my machine learning model?

sabih · March 25, 2023, 1:41pm

I’ve developed a machine learning model and now I’m looking to assess its optimality. What are the main metrics I should focus on to evaluate its performance, and how should I interpret the results?

Below is the code snippet, I’m currently working with:

import warnings
from sklearn.datasets import load_iris
from sklearn.exceptions import ConvergenceWarning
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Ignore the convergence warning
warnings.filterwarnings("ignore", category=ConvergenceWarning)

data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, random_state=42)

lr_model = LogisticRegression()
lr_model.fit(X_train, y_train)

I would greatly appreciate any guidance on assessing the optimality of my machine learning model.

muneeb · March 21, 2024, 9:20pm

To evaluate the model’s ability to generalize to new data, it is common practice to use cross-validation. This involves splitting the available data into training and testing sets, fitting the model to the training data, and evaluating its performance on the testing data.

In this example, we load the iris dataset and train a logistic regression classifier using 5-fold cross-validation. We use the cross_val_score function from sci-kit-learn’s model_selection module to perform the cross-validation, and pass in the classifier, the data and labels, and the number of folds (cv=5). The function returns an array of accuracy scores for each fold, which we then average to get the cross-validated accuracy score.