MITIGATING CLASS IMBALANCE IN OFFENSIVE LANGUAGE DETECTION IN MALAYALAM THROUGH NLPAUG
Keywords:
Offensive language, mBERT, NlpAug, stratified K-fold, Data AugmentationAbstract
The rise of technology alongside the prevalence of social media and the promotion of free speech has resulted in an increased presence of vulnerable content in the public sphere. Currently, various researches demonstrate that the identification of offensive language plays a crucial role in preventing or protecting vulnerable groups. Our attention is directed towards the detection of offensive language in Malayalam, recognizing the scarcity of existing research in this area for the Malayalam language. mBERT demonstrates effectiveness across Indian languages. To address class imbalances within datasets, we employed NlpAug for word-level augmentation and achieved a significant improvement in macro F1 score of 0.31.