Detection of Sentiment in Luxembourgish User Comments

Sentiment is all around us in everyday life. It can be found in blog posts, social media comments, text messages and many other places where people express themselves. Sentiment analysis is the task of automatically detecting those sentiments, attitudes or opinions in written text. In this research, the first sentiment analysis solution for the low-resource language, Luxembourgish, is conducted using a large corpus of user comments published on the RTL Luxembourg website www.rtl.lu. Various resources were created for this purpose to set the foundation for further sentiment research in Luxembou... Mehr ...

Verfasser: Gierschek, Daniela
Dokumenttyp: doctoral thesis
Erscheinungsdatum: 2022
Verlag/Hrsg.: Unilu - University of Luxembourg
Schlagwörter: Computational Linguistics / Luxembourgish / Linguistics / Sentiment / Engineering / computing & technology / Computer science / Arts & humanities / Languages & linguistics / Ingénierie / informatique & technologie / Sciences informatiques / Arts & sciences humaines / Langues & linguistique
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-27522257
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://orbilu.uni.lu/handle/10993/50533

Sentiment is all around us in everyday life. It can be found in blog posts, social media comments, text messages and many other places where people express themselves. Sentiment analysis is the task of automatically detecting those sentiments, attitudes or opinions in written text. In this research, the first sentiment analysis solution for the low-resource language, Luxembourgish, is conducted using a large corpus of user comments published on the RTL Luxembourg website www.rtl.lu. Various resources were created for this purpose to set the foundation for further sentiment research in Luxembourgish. A Luxembourgish sentiment lexicon and an annotation tool were built as external resources that can be used for collecting and enlarging training data for sentiment analysis tasks. Additionally, a corpus of mainly sentences of user comments was annotated with negative, neutral and positive labels. This corpus was furthermore automatically translated to English and German. Afterwards, diverse text representations such as word2vec, tf-idf and one-hot encoding were used on the three versions of the corpus of labeled sentences for training different machine learning models. Furthermore, one part of the experimental setup leveraged linguistic features for the classification process in order to study their impact on sentiment expressions. By following such a broad strategy, this thesis not only sets the basis for sentiment analysis with Luxembourgish texts but also intends to give recommendations for conducting sentiment detection research for other low-resource languages. It is demonstrated that creating new resources for a low-resource language is an intensive task and should be carefully planned in order to outperform working with translations where the target language is a high-resource language such as English and German.