Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 3.0

Verfasser:	Erjavec, Tomaž Kopp, Matyáš Ogrodniczuk, Maciej Osenova, Petya Fišer, Darja Pirker, Hannes Wissik, Tanja Schopper, Daniel Kirnbauer, Martin Mochtak, Michal Ljubešić, Nikola Rupnik, Peter Pol, Henk van der Depoorter, Griet de Does, Jesse Simov, Kiril Grigorova, Vladislava Grigorov, Ilko Jongejan, Bart Haltrup Hansen, Dorte Navarretta, Costanza Mölder, Martin Kahusk, Neeme Vider, Kadri Bel, Nuria Antiba-Cartazo, Iván Pisani, Marilina Zevallos, Rodolfo Regueira, Xosé Luís Vladu, Adina Ioana Magariños, Carmen Bardanca, Daniel Barcala, Mario Garcia, Marcos Pérez Lago, María García Louzao, Pedro Vivel Couso, Ainhoa Vázquez Abuín, Marta García Díaz, Noelia Vidal Miguéns, Adrián Fernández Rei, Elisa Diwersy, Sascha Luxardo, Giancarlo Coole, Matthew Rayson, Paul Nwadukwe, Amanda Gkoumas, Dimitris Papavassiliou, Vassilis Prokopidis, Prokopis Gavriilidou, Maria Piperidis, Stelios Ligeti-Nagy, Noémi Jelencsik-Mátyus, Kinga Varga, Zsófia Dodé, Réka Barkarson, Starkaður Agnoloni, Tommaso Bartolini, Roberto Frontini, Francesca Montemagni, Simonetta Quochi, Valeria Venturi, Giulia Ruisi, Manuela Marchetti, Carlo Battistoni, Roberto Darģis, Roberts van Heusden, Ruben Marx, Maarten Depuydt, Katrien Tungland, Lars Magne Rudolf, Michał Nitoń, Bartłomiej Aires, José Mendes, Amália Cardoso, Aida Pereira, Rui Yrjänäinen, Väinö Norén, Fredrik Mohammadi Magnusson, Måns Jarlbrink, Johan Meden, Katja Pančur, Andrej Ojsteršek, Mihael Çöltekin, Çağrı Kryvenko, Anna
Dokumenttyp:	corpus
Erscheinungsdatum:	2023
Verlag/Hrsg.:	CLARIN ERIC
Schlagwörter:	Parla-CLARIN / parliamentary debates / COVID-19 / TEI / Bulgarian Parliament / Croatian Parliament / Polish Parliament / Slovenian Parliament / Czech Parliament / Icelandic Parliament / Belgian Parliament / Danish Parliament / Dutch Parliament / Turkish Parliament / Italian Parliament / Hungarian Parliament / Latvian Parliament / French Parliament / Bosnian Parliament / Catalonian Parliament / Galician Parliament / Greek Parliament / Norwegian Parliament / Portuguese Parliament / Serbian Parliament / Swedish Parliament / Ukrainian Parliament / British Parliament
Sprache:	Bosnian Bulgarian Catalan Croatian Tschechisch Danish Niederländisch Englisch Estonian Französisch Galician Deutsch Hungarian ice Italian Latvian Greek Norwegian Polish Portuguese Russian Serbian Slovenian Spanish Swedish Turkish Ukrainian
Permalink:	https://search.fid-benelux.de/Record/base-26922127
Datenquelle:	BASE; Originalkatalog
Powered By:	BASE
Link(s) :	http://hdl.handle.net/11356/1488

ParlaMint 3.0 is a multilingual set of 26 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2022, with the individual corpora being between 9 and 125 million words in size. The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; and with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are also marked to the subcorpus they belong to ("reference", until 2020-01-30, "covid", from 2020-01-31, and "war", from 2022-02-24). This entry contains the linguistically marked-up version of the corpora, while the text version is available at http://hdl.handle.net/11356/1486. The ParlaMint.ana linguistic annotation includes tokenization, sentence segmentation, lemmatisation, Universal Dependencies part-of-speech, morphological features, and syntactic dependencies, and the 4-class CoNLL-2003 named entities. Some corpora also have further linguistic annotations, such as PoS tagging or named entities according to language-specific schemes, with their corpus TEI headers giving further details on the annotation vocabularies and tools. The compressed files include the ParlaMint.ana XML TEI-encoded linguistically annotated corpora; the derived corpora in CoNLL-U with TSV speech metadata; and the vertical files (with registry file), suitable for use with CQP-based concordancers, such as CWB, noSketch Engine or KonText. Also included is the 3.0 release of the data and scripts available at the GitHub repository of the ParlaMint project. As opposed to the previous version 2.1, this version ...