I am performing regression analysis on the IMDB movies dataset to see whether the rating, budget and other various variables effect the popularity of a movie.

I am doing some analysis to explore the relationship between the data variables. Both dependent and Independent.

Can anyone tell me whether it is okay to assume that **correlation** directly implies **causation**? Is there any difference between the two of these? I need to know this since this cause and effect scenario is later used to make decisions about the model.

**Correlation** tells us how strongly the pair of variables are linearly related and change together. It does not tell us why and how behind the relationship but it just says the relationship exists.

**Causation** says any change in the value of one variable will **cause** a change in the value of another variable. It is also referred as cause and effect.

**Example :** **Ice cream sales is correlated with homicides in New York**

As the sales of ice cream rise and fall, so do the number of homicides. Does the consumption of ice cream causing the death of the people?

No. Two things are correlated doesn’t mean one causes other.

It is possible that there exists some third variable which is causing both the ice cream sales and homicides to change simultaneously.

When two variables change in tandem, they are correlated.

When one variables changes because of another, their relationship is of causation.

Rising price of gas and number of cars could be correlated. This can’t be causation because there are other factors affecting oil prices.

The density of traffic, however, is caused by the number of cars owned.