ROBERTA FOR MULTI-LABEL THAI TEXT CLASSIFICATION

Suwika Plubin, Bandhita Plubin, Walaithip Bunyatisai, Manad Khamkong, Thanasak Mouktonglang

Abstract


This study applies the RoBERTa model for multi-label classification of Thai-language customer reviews in the banking sector, utilizing a dataset of 24,500 reviews labeled into multiple categories. The objective is to assess RoBERTa's ability to handle complex linguistic structures and imbalanced data while categorizing reviews into multiple relevant labels. RoBERTa's transformer-based architecture, with its self-attention mechanism, is highly effective in capturing the contextual meaning of Thai text, a language known for its unique challenges, such as lack of spaces between words and tonal variations. The model demonstrated strong performance, achieving a macro average precision of 0.83, an F1-score of 0.71, and a Hamming Loss of 0.083. SMOTE was employed to improve recall in underrepresented categories, enhancing the overall performance balance. The results highlight RoBERTa's effectiveness in Thai-language multi-label text classification, showcasing its capability to manage imbalanced data and deliver accurate, context-aware predictions across multiple categories.


Full Text:

Untitled

Refbacks

  • There are currently no refbacks.