OPTIMIZING BERTSNN TO ENHANCE SOURCE-TARGET DOMAIN SIMILARITY SCORING FOR CROSS-DOMAIN SENTIMENT CLASSIFICATION OF PRODUCT REVIEWS
Main Article Content
Abstract
Cross-domain sentiment analysis (CDSA) predicts sentiment polarity in a target domain using knowledge from source domains but existing CDSA methods lack effective source domain selection strategies. This study investigates BertSNN, which combines pre-trained BERT embeddings, a Siamese neural network, and various distance metrics to measure domain similarity and optimize source domain selection for CDSA. First, we experiment with document-level (DocBERT) and sentence-level (SentenceBERT) embeddings with BiLSTM and BiLSTM + CNN neural network configurations to identify the best combination for BertSNN. Second, we explore two distance metrics—Euclidean and Manhattan—alongside shifted cosine similarity to determine the most effective choice for domain similarity scoring. Using product reviews, we test on 25 target domains, examining whether using multiple top most similar source domains improve cross-domain sentiment classification compared to a single most similar source domain. Results indicate that document-level embeddings, BiLSTM and shifted cosine similarity produce the most optimal BertSNN that can select high-quality similar source domains to train a cross-domain sentiment classifier for a target domain, beating two other traditional baseline methods (i.e., bag-of-words and TF-IDF representations). Our findings also show that using top five most similar source domains (k = 5) for training generally improves cross-domain sentiment classification performance as opposed to using a single most similar source domain (k = 1). This study contributes to CDSA by advancing the understanding of embedding choices and distance metrics within a Siamese neural network for source-target domain similarity scoring and providing actionable insights on domain selection strategies to improve sentiment analysis models.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.