A HYBRID CONTEXTUAL EMBEDDING BASED CLUSTERING AND CLASSIFICATION TECHNIQUE FOR UNSUPERVISED IMPLICIT ASPECT CATEGORIZATION IN INDONESIAN REVIEWS
Main Article Content
Abstract
Aspect categorization is a grouping of reviews based on aspect categories that follow the review domain. The problem arises when only sentiment features appear as a clue to predict implicit aspects. On the other hand, implicit aspects play an important role in generating a summary. Without implicit aspect, we probably lose some important words needed for analyzing user’s reviews. Existing techniques face difficulties in utilizing the implicit aspects due to limited resources and computationally expensive problems. Hence, we propose an implicit aspect categorization model based on a hybrid contextual embedding-based clustering and classification technique. We developed the model using an unsupervised learning approach which is no need labelled data in training. A contextual embedding-based clustering technique generated train data from explicit sentences which will be used to classify implicit aspect categorization. Four steps of the proposed implicit aspects categorization model, i.e. preprocessing data, sentence feature selection, generating train data based on clustering, and finally categorizing implicit aspect using classification technique. We experiment with several classification techniques to get the best combination of the proposed technique (i.e. Logistic Regression, Support Vector Machine, Naïve Bayes, Decision Tree, and Random Forest). Based on the experiment, the combination of contextual embedding-based clustering and Random Forest algorithm produces higher accuracy than other classification techniques, with accuracy tent to 72.04% and F1 score in 0.6788.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.