Checking Dataframe for Missing Values

This thread is focused on a step that is performed during Data Cleaning where the data is checked for ambiguities and one of those is the check for missing values in the dataset. There are numerous methods for checking missing values in your dataset and a few of the most common ones will be discussed in this thread.

Before getting into the detail of the methods, if you want to learn how you can create your data frame using multiple series, have a look at the thread of Merge many series to create a dataframe.

1. Using "isnull()" method:

  • The isnull() method is used to check if a value in a Pandas DataFrame or Series is null or missing. It returns a boolean array of the same shape as the input.
  • Since we only want to check if there are missing values or not, we use the any() method to check if there is at least one True value in the results provided by the isnull() method.

2. Using "isna()" method:

  • The isna() method is similar to the isnull() method and returns a boolean array of the same shape as the input, where True values represent missing values and False values represent non-missing values.
  • We use the any() method on the results of isna() to check if there is at least one True value.

3. Using "notna()" method:

  • The notna() method returns a boolean DataFrame where True values represent non-missing values and False represents missing values.
  • We use the all() method to check if all values in the DataFrame are True, if the result is False, it means we have missing values in the data.

4. Using "info()" method:

  • The info() method prints the information of the data frame, including the number of non-null values per column and total entries per column.
  • If a column has missing values, the non-null count will be lower than the total entries.