COMPARATIVE ANALYSIS OF TRANSFORMER MODELS FOR SENTIMENT CLASSIFICATION IN CODE- MIXED INDIC LANGUAGES
DOI:
https://doi.org/10.63458/ijerst.v3i1.101Keywords:
Sentiment analysis, transformer-based models, code-mixed tasks, and indicator-BERTAbstract
Multiple language usage in a single message, or code-mixed text, has increased dramatically as a result of increased social media engagement. Because of this, activities involving Natural Language Processing (NLP), such as sentiment analysis and cyberbullying identification. Models that can effectively manage linguistic variability while retaining high accuracy are needed to address these issues. We investigate transformer-based designs that improve classification performance by utilizing knowledge transfer strategies. RoBERTa, GPT-2, XLM-RoBERTa, and IndicBERT are used in our method, which enhances classification accuracy by the transfer of sharing-private information across code-mixed and monolingual tasks. Results from experiments show that our multi-task framework surpasses single-task models with high accuracy on all datasets with:IndicBERT achieved 96.86% for Hinglish, XLM-RoBERTa achieved 96.95% for Punglish, and IndicBERT obtained 97.55% for Tanglish. In order to advance reliable NLP applications in multilingual environments, this project highlights the transformers' multi-task learning capabilities in enhancing performance on low-resource and code-mixed languages.
References
Akhtar, M.S., Ghosal, D., Ekbal, A., et al., ‘A multi-task ensemble framework for emotion, sentiment and intensity prediction’. https://doi.org/10.48550/arXiv.1808.01216. 2018
Akhtar, M. S., Gupta, D., Ekbal, A., et al., ‘Feature selection and ensemble construction: A two-step method for aspect based sentiment analysis’. Knowledge-Based Systems, 125, 116–135. https://doi.org/ 10.1016/j.knosys.2017.03.020. 2017
Akhtar, M.S., Kumar, A., Ghosal, D., et al., ‘A multilayer perceptron based ensemble technique for fine-grained financial sentiment analysis’. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 540–546. Association for Computational Linguistics. https://doi.org/ 10.18653/v1/D17-1057. 2017
Akhtar, M.S., Kumar, A., Ekbal, A., et al., ‘A hybrid deep learning architecture for sentiment analysis. In: Proceedings of COLING 2016’, the 26th International Conference on Computational Linguistics: Technical Papers. 482–493. The COLING 2016 Organizing Committee. https://aclanthology.org/C16- 1047. 2016
Alam, F., ‘Bangla text classification using transformers’. https://doi.org/10.48550/ arXiv.2011.04446.2020
Akhtar, M.S., Sawant, P., Sen, S., et al., ‘Solving data sparsity for aspect based sentiment analysis using cross-linguality and multi-linguality’. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 572–582. Association for Computational Linguistic. https://doi.org/10.18653/v1/N18- 1053.2018
Balamurali, A., Joshi, A., Bhattacharyya, P., ‘Cross-lingual sentiment analysis for indian languages using linked wordnets. In: Proceedings of COLING 2012: Posters. 73–82. The COLING 2012 OrganizingCommittee. https://aclanthology.org/C12-2008. 2012
Bhowmik, N. R., Arifuzzaman, M., Mondal, M. R. H., et al., ‘Bangla text sentiment analysis using supervised machine learning with extended lexicon dictionary’. Natural Language Processing Research, 1(3–4), 34–45. https://doi.org/10.2991/nlpr.d.210316.001.2021
Geetha, M., & Renuka, D. K., ‘Improving the performance of aspect based sentiment analysis using fine-tuned bert base uncased model’. International Journal of Intelligent Networks, 2, 64–69. https://doi. org/10.1016/j.ijin.2021.06.005.2021
Ghiassi, M., & Lee, S. ,‘A domain transferable lexicon set for twitter sentiment analysis using a supervised machine learning approach’. Expert Systems with Applications, 106, 197–216. https://doi.org/10. 1016/j.eswa.2018.04.006.2018
Gupta, P., Kumar, S., Suman, R., et al., ‘Sentiment analysis of lockdown in india during covid-19: A case study on twitter’. IEEE Transactions on Computational Social Systems. https://doi.org/10.1109/TCSS.
3042446.2020
Liu, J., Chen, X., Feng, S., et al., ‘Kk2018 at SemEval-2020 task 9: Adversarial training for code-mixing sentiment classification’. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation. 817–823. International Committee for Computational Linguistics, Barcelona (online). https://doi.org/10.18653/ v1/2020.semeval-1.103.2020
Liu, P., Qiu, X., Huang, X., ‘Adversarial multi-task learning for text classification’. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1–10. Association for Computational Linguistics, Vancouver, Canada. https://doi.org/10.18653/v1/P17- 1001.2017
Mamta, M., Ekbal, A., Bhattacharyya, P., et al., ‘Multi-domain tweet corpora for sentiment analysis: Resource creation and evaluation’. In: Proceedings of the 12th Language Resources and Evaluation Conference. 5046–5054. European Language Resources Association, Marseille, France, https://aclanthology. org/2020.lrec-1.621.2020
Pang, B., Lee, L., Vaithyanathan, S., ‘Thumbs up?: sentiment classification using machine learning
techniques’. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. 79–86. Association for Computational Linguistics. https://doi.org/10.3115/ 1118693.1118704. 2002
Mamta, M., Ekbal, A., Bhattacharyya, P., et al., ‘HindiMD: A multi-domain corpora for low-resource sentiment analysis. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference. 7061–7070. European Language Resources Association, Marseille, France. https://aclanthology.org/ 2022.lrec-1.764.2022
Patwa, P., Aguilar, G., Kar, S., et al., ‘SemEval-2020 task 9: Overview of sentiment analysis of code-mixed tweets’. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation. 774–790. International Committee for Computational Linguistics, Barcelona (online). https://doi.org/10.18653/ v1/2020.semeval-1.100. 2020
Pires, T., Schlinger, E., Garrette, D., ‘How multilingual is multilingual BERT?’ In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4996–5001. Association for Computational Linguistics, Florence, Italy. https://doi.org/10.18653/v1/P19-1493. 2019
Rana, S., Singh, A., ‘Comparative analysis of sentiment orientation using svm and naive bayes techniques’. In: 2016 2nd International Conference on Next Generation Computing Technologies (NGCT).106–111.IEEE. https://doi.org/10.1109/NGCT.2016.7877399. 2016
Sharma, S., Srinivas, P., Balabantaray, R.C., ‘Text normalization of code mix and sentiment analysis’. In: 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI). 1468– 1473. IEEE. https://doi.org/10.1109/ICACCI.2015.72758 , 2015