Data for: Psycholinguistic dataset on language use in 1145 novels published in English and Dutch

LIWC and n-gram counts of English and Dutch novels ================================================== This dataset consists of CSV files with word counts in several corpora: - 694 English language novels from different genders and orientations - 401 bestselling Dutch language novels - 50 novels nominated for Dutch literary prizes Each corpus comes with: - LIWC counts; this file also includes the available metadata for each novel. The English data was created with LIWC 2015. The Dutch data was created with the validated translation of LIWC 2001. - Word counts (unigrams) and bigram counts per no... Mehr ...

Verfasser: Severi Luoto (4439785)
Andreas van Cranenburgh (10706611)
Dokumenttyp: Dataset
Erscheinungsdatum: 2021
Schlagwörter: Arts and Humanities / Computational Linguistics
Sprache: unknown
Permalink: https://search.fid-benelux.de/Record/base-26662182
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://doi.org/10.17632/x3m2gjkhx5.1

LIWC and n-gram counts of English and Dutch novels ================================================== This dataset consists of CSV files with word counts in several corpora: - 694 English language novels from different genders and orientations - 401 bestselling Dutch language novels - 50 novels nominated for Dutch literary prizes Each corpus comes with: - LIWC counts; this file also includes the available metadata for each novel. The English data was created with LIWC 2015. The Dutch data was created with the validated translation of LIWC 2001. - Word counts (unigrams) and bigram counts per novel. All text has been converted to lowercase. Contractions are tokenized into separate tokens, e.g., can't => ca n't Two restrictions are applied: - only unigrams or bigrams that occur in at least 10 texts are retained - only the 100k most frequent are retained - Overall word counts and bigram counts; i.e., the sum across all novels. All files are encoded in UTF-8.