Various techniques can be used to deal with the missing values. The simplest way is to ignore them. But in some situations, this may not lead to desirable outcomes. In that scenario, we may have to impute the missing values. There are different techniques out there which can be used to calculate missing values:
- Mean: The missing value can be replaced by the mean value of other non-missing values for the same feature.
- Mode: Mode can also be used to calculate the missing values. One of the advantages of using mode is that it is robust against outliers.
- Median: Median is also one of a method to impute the missing value. Unlike mean, it is not affected by outliers.
- Regression: Regression is a method of using a mathematical model to calculate the missing value based on the other values for a given row.
- One-hot encoding: One-hot encoding can be used to find the similar cases and using the corresponding values of that feature to calculate the missing values.