I just finished reading a tutorial about the evaluation techniques for classification models. I understand precision and recall and the formula for F1-Score:
F1-Score = 2 Precision * Recall / precision + recall
But I don’t understand the reasoning behind using the harmonic mean. Can’t we simply use the average (Arithmetic mean):
F1score = precision + recall / 2
To answer why the harmonic mean is used instead of the arithmetic first consider this simple example
There is model which always categories everything as 1 (out of two classes 0,1). Now lets consider our test data which has infinite amount of class 0 and just one element of class 1.
The precision for the method will be 0 and the recall =1. When we take the arithmetic mean the F1 score comes out to be 50%. Despite the model to be extremely terrible, as it categories everything as class 1.
Instead of arithmetic mean, if we use the harmonic mean then we get 0. Which correctly tells that the model is terrible. So to get more accurate picture it is more useful to use the harmonic mean.