Models used were simple machine learning techniques to deep learning methods for Sentiment Classification. The best result obtained was from Feed Forward neural network with using BOW as input feature. CNN and Logistic Regression were also quite closer to it in terms of average accuracy. This is a good way of testing where we found out that using the complex method like CNN can be useful to extract complex features, but for sentiment classification having some set of words present in the sentence was indicative of sentiment of review. Hence, CNN did not add more value to the model as such. If the dataset included lot of complex phrases depicting the sentiment, then CNN would have been useful to extract those. Using more resources for CNN like complex method might not be the best solution for this case. It is good to test it and see how the results are for smaller dataset and you can also test it on complete dataset. If it is better than other simple approaches then I choose it for production otherwise I usually prefer a less resource heavy method which can give good enough results.
Following chart, I have listed all the average accuracies obtained for different methods implemented in the previous posts for sentiment classification.
Word2Vec embeddings capturing the semantics was not that useful compared to using simple BOW or TF-IDF in terms of accuracy improvement. As you can see, it can tremendously reduce the number of features being used for machine learning models. In the example used, BOW/TF-IDF had feature size of
as opposed to
for Word2Vec. Simple Word2Vec averaged vectors were used for classification of reviews and this turned out to be giving better results compared to using Doc2Vec. This always might not be the case since Doc2Vec is more superior in terms of algorithm to generalize for a document as a whole. Doc2Vec generates an efficient and high-quality distributed vectors which captures the precise syntactic and semantic word relationships ( ref
). Hence, it is always a good decision to test the two methods and compare the results.
Between BOW and TF-IDF, it turns out that BOW performed better than TF-IDF although not significantly but clearly using TF-IDF model did not improve the classification model. TF-IDF involves more computations than BOW hence it is good to test those before settling on one method. TF-IDF reduces the weights of the words which occur frequently across the whole corpus and are not unique to documents. This method does not seem to work well for sentiment classification problems related to reviews where positive and negative words used are appearing in most of the documents. When used as input to the Feed Forward Neural Network or Logistic Regression, you can save some time in training with multiple methods like BOW, TF-IDF, Word2Vec or Doc2Vec if you have tested and compared them using simple classification models. If you want to experiment further, try using these as inputs to the models in PyTorch as input.
Feed Forward Neural Network performed well for the given data. For neural networks, different activation, optimization, loss functions as mentioned in the previous posts can be tested to see if this result improves. You could try testing with different learning rate, epochs, other non-linear activation functions like
etc. , other optimization algorithms like
etc. There is lot of room for experimentation and obtaining better results than achieved in the previous posts.
I have shown in the experiments how to quickly prototype your models for Sentiment Classification problems. In this post, I have shown how I compared and planned to use various methods in different sequence to get the best result possible.
This is the end of the Sentiment Classification Series. I will aim to group relevant topics and this was my first series. I will continue publishing as I am exploring new topics. Watch this space!
As always — Happy experimenting and exploring!