This thread is focused on a step that is performed during Data Cleaning where the data is checked for ambiguities and one of those is the check for missing values in the dataset. There are numerous methods for checking missing values in your dataset and a few of the most common ones will be discussed in this thread.
Before getting into the detail of the methods, if you want to learn how you can create your data frame using multiple series, have a look at the thread of Merge many series to create a dataframe.
1. Using "isnull()" method:
- The
isnull()
method is used to check if a value in a Pandas DataFrame or Series is null or missing. It returns a boolean array of the same shape as the input. - Since we only want to check if there are missing values or not, we use the
any()
method to check if there is at least oneTrue
value in the results provided by theisnull()
method.
2. Using "isna()" method:
- The
isna()
method is similar to theisnull()
method and returns a boolean array of the same shape as the input, whereTrue
values represent missing values andFalse
values represent non-missing values. - We use the
any()
method on the results ofisna()
to check if there is at least oneTrue
value.
3. Using "notna()" method:
- The
notna()
method returns a boolean DataFrame whereTrue
values represent non-missing values andFalse
represents missing values. - We use the
all()
method to check if all values in the DataFrame areTrue
, if the result isFalse
, it means we have missing values in the data.
4. Using "info()" method:
- The
info()
method prints the information of the data frame, including the number of non-null values per column and total entries per column. - If a column has missing values, the non-null count will be lower than the total entries.