TF-IDF - Text Analytics with R | Learn Data Science

system · February 27, 2019, 9:56pm

Originally published at: https://tutorials.datasciencedojo.com/text-analytics-with-r-tf-idf/

TF-IDF includes specific coverage of:

• Discussion of how the document-term frequency matrix representation can be improved:
– How to deal with documents of unequal lengths.
– What to do about terms that are very common across documents.
•Introduction of the mighty term frequency-inverse document frequency to implement these improvements:
-TF for dealing with documents of unequal lengths.
-IDF for dealing with terms that appear frequently across documents.
• Implementation of TF-IDF using R functions and applying them to document-term frequency matrices.
• Data cleaning of matrices post weighting/transformation.

Kaggle Dataset can be found here

The data and R code used in this series is available here

(250)