Scikit-learn comes with a few basic standard datasets that do not require any additional file downloads.
They can be loaded using the following functions:
load_iris(*[, return_X_y, as_frame])` Load and return the iris dataset (classification).
load_diabetes(*[, return_X_y, as_frame, scaled])` Load and return the diabetes dataset (regression).
load_digits(*[, n_class, return_X_y, as_frame])` Load and return the digits dataset (classification).
load_linnerud(*[, return_X_y, as_frame])` Load and return the physical exercise Linnerud dataset.
load_wine(*[, return_X_y, as_frame])` Load and return the wine dataset (classification).
These datasets are useful to quickly illustrate the behavior of the various algorithms implemented in scikit-learn. They are however often too small to be representative of real world machine learning tasks.
Loading Iris Plant Dataset
Here’s an example using the popular iris
dataset, which contains measurements of different parts of iris flowers:
This code will load the iris dataset, convert it to a Pandas DataFrame and print the first few rows of the DataFrame.
Loading Diabetes Dataset
Here’s how to load the Diabetes dataset using scikit-learn:
This will output the first few rows of the Diabetes dataset in a tabular format, where each row represents a sample and each column represents a feature: