COMPARATIVE ANALYSIS OF TRANSFORMER MODELS FOR SENTIMENT CLASSIFICATION IN CODE- MIXED INDIC LANGUAGES

Authors

  • Mohana Priya K.T Department of CSE, Kongu Engineering College Erode, India
  • Shrinithi G Department of CSE, Kongu Engineering College Erode, India
  • Nithish P Department of CSE, Kongu Engineering College Erode, India
  • Pranesh A C Department of CSE, Kongu Engineering College Erode, India

DOI:

https://doi.org/10.63458/ijerst.v3i1.101

Keywords:

Sentiment analysis, transformer-based models, code-mixed tasks, and indicator-BERT

Abstract

Multiple language usage in a single message, or code-mixed text, has increased dramatically as a result of increased social media engagement. Because of this, activities involving Natural Language Processing (NLP), such as sentiment analysis and cyberbullying identification. Models that can effectively manage linguistic variability while retaining high accuracy are needed to address these issues. We investigate transformer-based designs that improve classification performance by utilizing knowledge transfer strategies. RoBERTa, GPT-2, XLM-RoBERTa, and IndicBERT are used in our method, which enhances classification accuracy by the transfer of sharing-private information across code-mixed and monolingual tasks. Results from experiments show that our multi-task framework surpasses single-task models with high accuracy on all datasets with:IndicBERT achieved 96.86% for Hinglish, XLM-RoBERTa achieved 96.95% for Punglish, and IndicBERT obtained 97.55% for Tanglish. In order to advance reliable NLP applications in multilingual environments, this project highlights the transformers' multi-task learning capabilities in enhancing performance on low-resource and code-mixed languages. 

Author Biographies

Mohana Priya K.T, Department of CSE, Kongu Engineering College Erode, India

Assistant Professor

Department of CSE, Kongu Engineering College Erode, India

Shrinithi G, Department of CSE, Kongu Engineering College Erode, India

UG Scholar,

Department of CSE, Kongu Engineering College Erode, India

Nithish P, Department of CSE, Kongu Engineering College Erode, India

UG Scholar,

Department of CSE, Kongu Engineering College Erode, India

Pranesh A C, Department of CSE, Kongu Engineering College Erode, India

UG Scholar,

Department of CSE, Kongu Engineering College Erode, India

References

Akhtar, M.S., Ghosal, D., Ekbal, A., et al., ‘A multi-task ensemble framework for emotion, sentiment and intensity prediction’. https://doi.org/10.48550/arXiv.1808.01216. 2018

Akhtar, M. S., Gupta, D., Ekbal, A., et al., ‘Feature selection and ensemble construction: A two-step method for aspect based sentiment analysis’. Knowledge-Based Systems, 125, 116–135. https://doi.org/ 10.1016/j.knosys.2017.03.020. 2017

Akhtar, M.S., Kumar, A., Ghosal, D., et al., ‘A multilayer perceptron based ensemble technique for fine-grained financial sentiment analysis’. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 540–546. Association for Computational Linguistics. https://doi.org/ 10.18653/v1/D17-1057. 2017

Akhtar, M.S., Kumar, A., Ekbal, A., et al., ‘A hybrid deep learning architecture for sentiment analysis. In: Proceedings of COLING 2016’, the 26th International Conference on Computational Linguistics: Technical Papers. 482–493. The COLING 2016 Organizing Committee. https://aclanthology.org/C16- 1047. 2016

Alam, F., ‘Bangla text classification using transformers’. https://doi.org/10.48550/ arXiv.2011.04446.2020

Akhtar, M.S., Sawant, P., Sen, S., et al., ‘Solving data sparsity for aspect based sentiment analysis using cross-linguality and multi-linguality’. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 572–582. Association for Computational Linguistic. https://doi.org/10.18653/v1/N18- 1053.2018

Balamurali, A., Joshi, A., Bhattacharyya, P., ‘Cross-lingual sentiment analysis for indian languages using linked wordnets. In: Proceedings of COLING 2012: Posters. 73–82. The COLING 2012 OrganizingCommittee. https://aclanthology.org/C12-2008. 2012

Bhowmik, N. R., Arifuzzaman, M., Mondal, M. R. H., et al., ‘Bangla text sentiment analysis using supervised machine learning with extended lexicon dictionary’. Natural Language Processing Research, 1(3–4), 34–45. https://doi.org/10.2991/nlpr.d.210316.001.2021

Geetha, M., & Renuka, D. K., ‘Improving the performance of aspect based sentiment analysis using fine-tuned bert base uncased model’. International Journal of Intelligent Networks, 2, 64–69. https://doi. org/10.1016/j.ijin.2021.06.005.2021

Ghiassi, M., & Lee, S. ,‘A domain transferable lexicon set for twitter sentiment analysis using a supervised machine learning approach’. Expert Systems with Applications, 106, 197–216. https://doi.org/10. 1016/j.eswa.2018.04.006.2018

Gupta, P., Kumar, S., Suman, R., et al., ‘Sentiment analysis of lockdown in india during covid-19: A case study on twitter’. IEEE Transactions on Computational Social Systems. https://doi.org/10.1109/TCSS.

3042446.2020

Liu, J., Chen, X., Feng, S., et al., ‘Kk2018 at SemEval-2020 task 9: Adversarial training for code-mixing sentiment classification’. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation. 817–823. International Committee for Computational Linguistics, Barcelona (online). https://doi.org/10.18653/ v1/2020.semeval-1.103.2020

Liu, P., Qiu, X., Huang, X., ‘Adversarial multi-task learning for text classification’. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1–10. Association for Computational Linguistics, Vancouver, Canada. https://doi.org/10.18653/v1/P17- 1001.2017

Mamta, M., Ekbal, A., Bhattacharyya, P., et al., ‘Multi-domain tweet corpora for sentiment analysis: Resource creation and evaluation’. In: Proceedings of the 12th Language Resources and Evaluation Conference. 5046–5054. European Language Resources Association, Marseille, France, https://aclanthology. org/2020.lrec-1.621.2020

Pang, B., Lee, L., Vaithyanathan, S., ‘Thumbs up?: sentiment classification using machine learning

techniques’. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. 79–86. Association for Computational Linguistics. https://doi.org/10.3115/ 1118693.1118704. 2002

Mamta, M., Ekbal, A., Bhattacharyya, P., et al., ‘HindiMD: A multi-domain corpora for low-resource sentiment analysis. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference. 7061–7070. European Language Resources Association, Marseille, France. https://aclanthology.org/ 2022.lrec-1.764.2022

Patwa, P., Aguilar, G., Kar, S., et al., ‘SemEval-2020 task 9: Overview of sentiment analysis of code-mixed tweets’. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation. 774–790. International Committee for Computational Linguistics, Barcelona (online). https://doi.org/10.18653/ v1/2020.semeval-1.100. 2020

Pires, T., Schlinger, E., Garrette, D., ‘How multilingual is multilingual BERT?’ In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4996–5001. Association for Computational Linguistics, Florence, Italy. https://doi.org/10.18653/v1/P19-1493. 2019

Rana, S., Singh, A., ‘Comparative analysis of sentiment orientation using svm and naive bayes techniques’. In: 2016 2nd International Conference on Next Generation Computing Technologies (NGCT).106–111.IEEE. https://doi.org/10.1109/NGCT.2016.7877399. 2016

Sharma, S., Srinivas, P., Balabantaray, R.C., ‘Text normalization of code mix and sentiment analysis’. In: 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI). 1468– 1473. IEEE. https://doi.org/10.1109/ICACCI.2015.72758 , 2015

Downloads

Published

2025-03-25

How to Cite

Mohana Priya K.T, Shrinithi G, Nithish P, & Pranesh A C. (2025). COMPARATIVE ANALYSIS OF TRANSFORMER MODELS FOR SENTIMENT CLASSIFICATION IN CODE- MIXED INDIC LANGUAGES. International Journal of Engineering Research and Sustainable Technologies (IJERST), 3(1), 1–9. https://doi.org/10.63458/ijerst.v3i1.101

ARK