Comparison of Different Modeling Techniques for Flemish Twitter Sentiment Analysis

Microblogging websites such as Twitter have caused sentiment analysis research to increase in popularity over the last several decades. However, most studies focus on the English language, which leaves other languages underrepresented. Therefore, in this paper, we compare several modeling techniques for sentiment analysis using a new dataset containing Flemish tweets. The key contribution of our paper lies in its innovative experimental design: we compared different preprocessing techniques and vector representations to find the best-performing combination for a Flemish dataset. We compared mo... Mehr ...

Verfasser: Manon Reusens
Michael Reusens
Marc Callens
Seppe vanden Broucke
Bart Baesens
Dokumenttyp: Text
Erscheinungsdatum: 2022
Verlag/Hrsg.: Multidisciplinary Digital Publishing Institute
Schlagwörter: sentiment analysis / big data / preprocessing / word embeddings / bidirectional LSTM / BERT
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-27089631
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://doi.org/10.3390/analytics1020009

Microblogging websites such as Twitter have caused sentiment analysis research to increase in popularity over the last several decades. However, most studies focus on the English language, which leaves other languages underrepresented. Therefore, in this paper, we compare several modeling techniques for sentiment analysis using a new dataset containing Flemish tweets. The key contribution of our paper lies in its innovative experimental design: we compared different preprocessing techniques and vector representations to find the best-performing combination for a Flemish dataset. We compared models belonging to four different categories: lexicon-based methods, traditional machine-learning models, neural networks, and attention-based models. We found that more preprocessing leads to better results, but the best-performing vector representation approach depends on the model applied. Moreover, an immense gap was observed between the performances of the lexicon-based approaches and those of the other models. The traditional machine learning approaches and the neural networks produced similar results, but the attention-based model was the best-performing technique. Nevertheless, a tradeoff should be made between computational expenses and performance gains.