I am developing a spam filter using Scikit. Here are the steps I follow:

Xdata = `["This is spam" , "This is Ham" , "This is another spam"]`

`Matrix`

= `Countvectorizer (XData)`

. Matrix will contain count of each word in all documents. So Matrix[i][j] will give me counts of word `j`

in document `i`

`Matrix_idfX`

= `TFIDFVectorizer(Matrix)`

. It will normalize score.

`Matrix_idfX_Select`

= `SelectKBest( Matrix_IdfX , 300)`

. It will reduce matrix to 300 best score columns

`Multinomial.train(Matrix_Idfx_Select)`

Now my question is that do I need to perform **normalization or standardization** in any of the above four steps? If yes, then after which step and why?