Pooled LSTM for Dutch cross-genre gender classification

We present the results of cross-genre and in-genre gender classification performed on the data sets of Dutch tweets, YouTube comments and news prepared for the CLIN 2019 shared task. We propose a recurrent neural network architecture for gender classification, in which the input word and part-of-speech sequences are fed to the LSTM layer, which is followed by average and max pooling layers. The best cross-genre accuracy of 55.2% was achieved by the model trained on YouTube comments and tweets, and tested on the balanced news corpus, while the best in-genre accuracy of 61.33% was achieved on Yo... Mehr ...

Verfasser: Martinc, Matej
Pollak, Senja
Dokumenttyp: conferencePaper
Erscheinungsdatum: 2019
Schlagwörter: cross-genre gender classification
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-27466199
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://zenodo.org/record/3559041

We present the results of cross-genre and in-genre gender classification performed on the data sets of Dutch tweets, YouTube comments and news prepared for the CLIN 2019 shared task. We propose a recurrent neural network architecture for gender classification, in which the input word and part-of-speech sequences are fed to the LSTM layer, which is followed by average and max pooling layers. The best cross-genre accuracy of 55.2% was achieved by the model trained on YouTube comments and tweets, and tested on the balanced news corpus, while the best in-genre accuracy of 61.33% was achieved on YouTube comments. Overall, the proposed approach ranked 2nd in the global cross-genre ranking and 6th in the global in-genre ranking of CLIN 2019 shared task.