I have both categorical and numeric variables in my data. There are missing values in both of the columns. How to cater with these missing values? Do I need to cater them differently for each data-set?
There is no one rule to treat the missing values. It depends on your data.
For the numeric variables you can use the average/mean of the whole column to substitute the missing values. The maximum or minimum of the column can also be taken to fill the missing value but it depends that does it make sense according to your data?
For the categorical variables use can see what is the most repeated value for that particular column and use it.
Although handling missing values by imputing (replacing with constant) using mean or median is a common suggestion, blindly using this approach could introduce serious bias into your model. Have a look at this talk where this idea is explored further.
One promising approach is to use Maximum Likelihood Estimation to replace the values, as discussed in this paper.
The technique entails introducing a random variable which follows the same distribution as the one followed by the known values of the same series and using to fill the missing values.