MITIGATING CLASS IMBALANCE IN OFFENSIVE LANGUAGE DETECTION IN MALAYALAM THROUGH NLPAUG

Authors

  • Munawwar K V Research Scholar, Central University of Tamil Nadu,India
  • Nandhini K Central University of Tamil Nadu, Thiruvarur, Tamil Nadu

Keywords:

Offensive language, mBERT, NlpAug, stratified K-fold, Data Augmentation

Abstract

The rise of technology alongside the prevalence of social media and the promotion of free speech has resulted in an increased presence of vulnerable content in the public sphere. Currently, various researches demonstrate that the identification of offensive language plays a crucial role in preventing or protecting vulnerable groups. Our attention is directed towards the detection of offensive language in Malayalam, recognizing the scarcity of existing research in this area for the Malayalam language. mBERT demonstrates effectiveness across Indian languages. To address class imbalances within datasets, we employed NlpAug for word-level augmentation and achieved a significant improvement in macro F1 score of 0.31.

Author Biography

Nandhini K, Central University of Tamil Nadu, Thiruvarur, Tamil Nadu

Assistant Professor, Central University of Tamil Nadu, Thiruvarur, Tamil Nadu

Downloads

Published

2024-03-25

How to Cite

K V, M., & Nandhini K. (2024). MITIGATING CLASS IMBALANCE IN OFFENSIVE LANGUAGE DETECTION IN MALAYALAM THROUGH NLPAUG. International Journal of Engineering Research and Sustainable Technologies (IJERST), 2(1), 30–36. Retrieved from https://ijerst.drmgrjournals.org/index.php/ijerst/article/view/73