Error: Iterable over raw text documents expected, string object received tfidf vectorizer

toobamukhtar · March 23, 2019, 4:05pm

I have trained a sentiment analysis model using the TFIDF vectorizer features with Logistic Regression as classifier. On testing time I am inputting the string of text into TFIDF vectorizer after preprocessing and normalizing the content. However, the following error keeps appearing while using the TFIDF to transform features :
ValueError: Iterable over raw text documents expected, string object received tfidf vectorizer

The code works fine when i am using more than 1 sample for testing.
If anyone can let me know of the problem then it would be great!

toobamukhtar · March 23, 2019, 4:12pm

I am going to explain this through using an example.
Let’s assume that you have one sample to test. Take an article content for instance. After preprocessing you have the following output:

x = "['doctor', 'mbbs', 'student', 'found', 'shot', 'dead', 'hostel']"

The above output right now is in string format. Look at the quotes around the [] brackets.
TFIDF transformer needs a list (or an iterable) containing a single element (which is nothing but the String itself).
The error can be removed by adding the following line:

x = [x]

After this the output will be in the following form :

["['doctor', 'mbbs', 'student', 'found', 'shot', 'dead', 'hostel']"]

Now you can see that the above list is iterable with a single string.

muralidhar_A · May 22, 2019, 5:21am

Any possibility to share code.

toobamukhtar · May 22, 2019, 3:50pm

The only code to resolve this error is to put the variable x into a list using the square brackets []. You can simply put the variable with content in this. This will satisfy the criteria on the input format.