My data mostly consists of precise tweets or comments (350-400 chars long). I used both Bag-Of-Word model and Naive Bayes classification. As a result, I’m having a lot of misclassified cases which are of the type mentioned below:
He sucked on a lemon early morning to get rid of hangover.
That movie sucked big time.
Now the problem is that during sentiment classification both are classified as Negative just because of the word “sucked”.
Similarly, during document classification both are classified into movies due to the presence of word sucked. I have a huge number of misclassification instances and don’t have any idea on how to improve the accuracy.