MOTEC: THE MALAY OFFENSIVE TEXT CLASSIFICATION USING EXTRA TREE AND DIALECTAL STANDARDIZATION

Main Article Content

Fairuz Amalina
Faiz Zaki
Hamza H. M. Altarturi
https://orcid.org/0000-0002-5486-9882
Hazim Hanif
Nor Badrul Anuar

Abstract

Cyberbullying has increased globally, with offensive text contributing significantly. Detecting of-fensive text in the Malay language is challenging due to non-standard Malay text, unique social media writing styles, lack of standardization, and limited resources. This study proposes the Malay Offensive Text Classification (MOTEC) framework to address these challenges. The MOTEC framework incorporates a Malay standardization preprocessing task, utilizing three specialized dictionaries: (a) abbreviations, (b) noisy text, and (c) Malaysian dialects. This approach enhances data quality by converting non-standard text into standardized Malay sentences before classifica-tion. For feature extraction, the framework employs Term Frequency-Inverse Document Frequency (TF-IDF) coupled with an Extra Tree classifier for the classification process. Evaluating the MOTEC framework using a private dataset collected from Twitter, we achieved a classification accuracy of 94%, significantly outperforming other studies, which reported an accuracy of 84%. The MOTEC framework substantially improves the classification of offensive Malay text by enhancing accuracy, reducing execution time, and improving data quality through effective language standardization.

Downloads

Download data is not yet available.

Article Details

How to Cite
Narudin, F. A., Faiz Zaki, Altarturi, H. H. M., Hanif, H., & Anuar, N. B. (2025). MOTEC: THE MALAY OFFENSIVE TEXT CLASSIFICATION USING EXTRA TREE AND DIALECTAL STANDARDIZATION. Malaysian Journal of Computer Science, 38(1), 82–99. Retrieved from https://mjcs.um.edu.my/index.php/MJCS/article/view/56105
Section
Articles
Author Biographies

Fairuz Amalina, Department of Computer System and Technology, Faculty of Computer Science and Information Technology, Universiti Malaya, 50603 Kuala Lumpur, Malaysia

FAIRUZ AMALINA received the B.Tech. degree (Hons.) in Networking Systems from the University of Kuala Lumpur (UniKL), Malaysia, and the Master degree in computer science from the University of Malaya, Kuala Lumpur. She is currently pursuing the Ph.D. degree in Cybersecurity with the Department of Computer Science and Information Technology, University of Malaya. She is also a part-time lecturer at the University Malaysia of Computer Science and Engineering (UNIMY). Her research interests include big data analytics, artificial intelligence, computer security, and malware detection systems.

Faiz Zaki, Department of Computer System and Technology, Faculty of Computer Science and Information Technology, Universiti Malaya, 50603 Kuala Lumpur, Malaysia

Faiz Zaki obtained his Master of Science (Web Science and Big Data Analytics) from the University College of London in 2017 and a PhD in Network Analytics from Universiti Malaya in 2022. He is currently serving as the Director of the Data and Information Management Center and a Senior Lecturer at the Department of Computer Systems and Technology, Faculty of Computer Science and Information Technology, Universiti Malaya. He is also a core member of the Center of Research for Cybersecurity and Network (CSNET). His research interests lie at the intersection between big data analytics and computer networking. As such, most of his works revolve around network analytics, such as network traffic classification. Currently, his research direction is steering towards producing real-time network analytics using technologies like edge computing and federated learning. Faiz Zaki also holds several professional certifications in computer networking, such as CCNA and HCIA, besides being an active member of IEEE Computer Society and Young Professionals.

Hamza H. M. Altarturi, Department of Computer System and Technology, Faculty of Computer Science and Information Technology, Universiti Malaya, 50603 Kuala Lumpur, Malaysia

Hamza H. M. Altarturi is currently a doctoral researcher at faculty of Computer Science and Information Technology at the University of Malaya, Kuala Lumpur, Malaysia. Mr. Altarturi received his Master of Computer Science, M.Sc. from the University Putra Malaysia, Selangor, Malaysia, in 2017. He received his Bachelor of Science, B.Sc., at Hebron University, Hebron, West Bank, Palestine in 2015. His research interests include software engineering, data mining, and artificial intelligence.

Hazim Hanif, Department of Software Engineering, Faculty of Computer Science and Information Technology, Universiti Malaya, 50603 Kuala Lumpur, Malaysia

Hazim Hanif is a Senior Lecturer at the Department of Software Engineering, Faculty of Computer Science and Information Technology at Universiti Malaya. He earned his Master of Computer Science (Research) - Information Security degree from Universiti Malaya in 2018 and his PhD in Computer Security and Artificial Intelligence from Imperial College London in 2023. His research interests focus on the analysis of the representation knowledge of source code and how it can enhance vulnerability detection models in identifying and localizing security vulnerabilities in source code. His published works and presentations have been featured in top-tier conferences and journals. He is also a core member of the Centre of Research for Cyber Security and Network (CSNET), Universiti Malaya.

Nor Badrul Anuar, Department of Computer System and Technology, Faculty of Computer Science and Information Technology, Universiti Malaya, 50603 Kuala Lumpur, Malaysia

Nor Badrul Anuar is a Professor and Associate Vice Chancellor for Infrastructure and Information Services at Universiti Malaya, where he also serves as Chief Information Officer and leads the Centre of Research for Cyber Security & Network (CSNET). He earned his Bachelor's and Master's degrees in Computer Science from Universiti Malaya and completed his Ph.D. at the University of Plymouth, UK. With expertise recognized globally, he has supervised numerous research students, published extensively, and currently serves on editorial boards, including that of the Journal of Network and Computer Applications. His research spans intrusion detection, network security, and federated learning.