SENTIMENT ANALYSIS USING MACHINE LEARNING: A COMPREHENSIVE REVIEW
Keywords:
Sentiment Analysis, Machine Learning, NLP, BERT, Supervised LearningAbstract
Sentiment analysis has emerged as a crucial area of natural language processing (NLP), leveraging machine learning techniques to interpret and classify emotions within textual data. This article presents a comprehensive review of the latest machine learning approaches employed in sentiment analysis, focusing on their methodologies, performance, and real-world applications. We discuss the effectiveness of various supervised learning algorithms, such as support vector machines (SVM), random forests, and neural networks, in sentiment classification tasks. Additionally, we highlight the impact of deep learning models, particularly recurrent neural networks (RNNs) and transformers like BERT, which have demonstrated remarkable accuracy in capturing the nuances of human language. The article also emphasizes the significance of data preprocessing techniques and the utilization of benchmark datasets for model training and evaluation. Furthermore, we explore the practical implications of sentiment analysis in business decision-making, customer feedback analysis, and targeted marketing strategies. This review aims to provide a valuable resource for researchers and practitioners seeking to understand and apply state-of-the-art machine learning techniques in sentiment analysis
References
A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau, "Sentiment analysis of Twitter data," in Proceedings of the Workshop on Languages in Social Media, 2011, pp. 30-38.
B. Liu, "Sentiment analysis and opinion mining," Synthesis Lectures on Human Language Technologies, vol. 5, no. 1, pp. 1-167, 2012.
E. Cambria, "Affective computing and sentiment analysis," IEEE Intelligent Systems, vol. 31, no. 2, pp. 102-107, 2016.
B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up? Sentiment classification using machine learning techniques," in Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002, pp. 79-86, doi: 10.3115/1118693.1118704.
T. Mullen and N. Collier, "Sentiment analysis using support vector machines with diverse information sources," in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2004, pp. 412-418.
C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning, vol. 20, no. 3, pp. 273-297, Sep. 1995, doi: 10.1007/BF00994018.
R. Moraes, J. F. Valiati, and W. P. Gavião Neto, "Document-level sentiment classification: An empirical comparison between SVM and ANN," Expert Systems with Applications, vol. 40, no. 2, pp. 621-633, Feb. 2013, doi: 10.1016/j.eswa.2012.07.059.
A. Liaw and M. Wiener, "Classification and regression by randomForest," R News, vol. 2, no. 3, pp. 18-22, 2002.
L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5-32, Oct. 2001, doi: 10.1023/A:1010933404324.
M. Ahmad, S. Aftab, and I. Ali, "Sentiment analysis of tweets using SVM," International Journal of Computer Applications, vol. 177, no. 5, pp. 25-29, Nov. 2017, doi: 10.5120/ijca2017915758.
Y. Kim, "Convolutional neural networks for sentence classification," in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1746-1751, doi: 10.3115/v1/D14-1181.
X. Zhang, J. Zhao, and Y. LeCun, "Character-level convolutional networks for text classification," in Advances in Neural Information Processing Systems, 2015, pp. 649-657.
Y. Kim, "Convolutional neural networks for sentence classification," in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1746-1751, doi: 10.3115/v1/D14-1181.
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.
K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, "LSTM: A search space odyssey," IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 10, pp. 2222-2232, Oct. 2017, doi: 10.1109/TNNLS.2016.2582924.
A. Graves, "Generating sequences with recurrent neural networks," arXiv:1308.0850, Aug. 2013.
A. Pak and P. Paroubek, "Twitter as a corpus for sentiment analysis and opinion mining," in Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), 2010, pp. 1320-1326.
V. Kharde and P. Sonawane, "Sentiment analysis of Twitter data: A survey of techniques," International Journal of Computer Applications, vol. 139, no. 11, pp. 5-15, Apr. 2016, doi: 10.5120/ijca2016908625.
Z. Jianqiang and G. Xiaolin, "Comparison research on text pre-processing methods on Twitter sentiment analysis," IEEE Access, vol. 5, pp. 2870-2879, 2017, doi: 10.1109/ACCESS.2017.2672677.
G. Forman, "An extensive empirical study of feature selection metrics for text classification," Journal of Machine Learning Research, vol. 3, pp. 1289-1305, Mar. 2003.
M. Hu and B. Liu, "Mining and summarizing customer reviews," in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 168-177, doi: 10.1145/1014052.1014073.
J. Ramos, "Using TF-IDF to determine word relevance in document queries," in Proceedings of the First Instructional Conference on Machine Learning, 2003, pp. 133-142.
A. Tripathy, A. Agrawal, and S. K. Rath, "Classification of sentiment reviews using n-gram machine learning approach," Expert Systems with Applications, vol. 57, pp. 117-126, Sep. 2016, doi: 10.1016/j.eswa.2016.03.028.
R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Ng, and C. Potts, "Recursive deep models for semantic compositionality over a sentiment treebank," in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013, pp. 1631-1642.
J. McAuley, C. Targett, Q. Shi, and A. van den Hengel, "Image-based recommendations on styles and substitutes," in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015, pp. 43-52, doi: 10.1145/2766462.2767755.
Y. Bengio, A. Courville, and P. Vincent, "Representation learning: A review and new perspectives," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798-1828, Aug. 2013, doi: 10.1109/TPAMI.2013.50.
X. Glorot, A. Bordes, and Y. Bengio, "Domain adaptation for large-scale sentiment classification: A deep learning approach," in Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011, pp. 513-520.
J. Howard and S. Ruder, "Universal language model fine-tuning for text classification," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 328-339, doi: 10.18653/v1/P18-1031.
B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up? Sentiment classification using machine learning techniques," in Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002, pp. 79-86, doi: 10.3115/1118693.1118704.
R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Ng, and C. Potts, "Recursive deep models for semantic compositionality over a sentiment treebank," in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013, pp. 1631-1642.
Y. Kim, "Convolutional neural networks for sentence classification," in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1746-1751, doi: 10.3115/v1/D14-1181.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4171-4186, doi: 10.18653/v1/N19-1423.
C. Sun, X. Qiu, Y. Xu, and X. Huang, "How to fine-tune BERT for text classification?," in China National Conference on Chinese Computational Linguistics, 2019, pp. 194-206, doi: 10.1007/978-3-030-32381-3_16.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4171-4186, doi: 10.18653/v1/N19-1423.
Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, "RoBERTa: A robustly optimized BERT pretraining approach," arXiv preprint arXiv:1907.11692, 2019.
Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le, "XLNet: Generalized autoregressive pretraining for language understanding," in Advances in Neural Information Processing Systems, 2019, pp. 5754-5764.
E. Strubell, A. Ganesh, and A. McCallum, "Energy and policy considerations for deep learning in NLP," in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 3645-3650, doi: 10.18653/v1/P19-1355.
V. Sanh, L. Debut, J. Chaumond, and T. Wolf, "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter," arXiv preprint arXiv:1910.01108, 2019.
X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, and Q. Liu, "TinyBERT: Distilling BERT for natural language understanding," in Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp. 4163-4174, doi: 10.18653/v1/2020.findings-emnlp.372.
D. Hovy, J. Ledell, and A. Riestenberg, "Domain adaptation for sentiment classification," in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 7716-7725, doi: 10.18653/v1/2020.emnlp-main.622.
W. J. Murdoch, P. J. Liu, and B. Yu, "Beyond word importance: Contextual decomposition to extract interactions from LSTMs," in Proceedings of the 6th International Conference on Learning Representations (ICLR), 2018.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Amit Taneja (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.