I have a data-set for regression analysis. The distribution of the dependent variable is left skewed. How do i cater to this? Will it effect my analysis if the distribution remains skewed? If not, then what should be done about this?
We need to know about the distributions of the variables in order to make inferences later on such as about the confidence intervals. However, it is not necessary that the dependent/response variable must be normally distributed. We just need to ensure that the sample size from which we are making our analysis is large enough to draw any conclusions later on.
The distribution of the variables is not a matter of grave concern. The balance of classes is.
Check that your classes are balanced.
@HHH Regression analysis just refers to linear regression (i.e. not classification) so the notion of class balance is not relevant here.
As for the OP’s question. I’m assuming you are talking about skewness compared to a standard normal distribution. The distribution of the dependant or independant variables does not affect the modeling characteristics of a regression model (the amount of data does affect it however). What you need to make sure is that the residuals are normally distributed.