This thread is focused on a step that is performed during Data Cleaning where the data is checked for ambiguities and one of those is the check for missing values in the dataset. There are numerous methods for checking missing values in your dataset and a few of the most common ones will be discussed in this thread.
Before getting into the detail of the methods, if you want to learn how you can create your data frame using multiple series, have a look at the thread of Merge many series to create a dataframe.
1. Using "isnull()" method:
isnull()method is used to check if a value in a Pandas DataFrame or Series is null or missing. It returns a boolean array of the same shape as the input.
- Since we only want to check if there are missing values or not, we use the
any()method to check if there is at least one
Truevalue in the results provided by the
2. Using "isna()" method:
isna()method is similar to the
isnull()method and returns a boolean array of the same shape as the input, where
Truevalues represent missing values and
Falsevalues represent non-missing values.
- We use the
any()method on the results of
isna()to check if there is at least one
3. Using "notna()" method:
notna()method returns a boolean DataFrame where
Truevalues represent non-missing values and
Falserepresents missing values.
- We use the
all()method to check if all values in the DataFrame are
True, if the result is
False, it means we have missing values in the data.
4. Using "info()" method:
info()method prints the information of the data frame, including the number of non-null values per column and total entries per column.
- If a column has missing values, the non-null count will be lower than the total entries.