A 38 Million Words Dutch Text Corpus and its Users

The use of text corpora has increased considerably in the past few years, not only in the field of lexicography but also in computational linguistics and language technology. Consequently, corpus data and expertise developed by lexicographical institutions have gained a broader scope of application. In the European context this has led to a revised view of corpus design. In line with these developments, the Institute for Dutch Lexicology (INL) has since 1994 been providing external access to steadily improving corpora via Internet. In August 1996, the 38 Million Words Corpus was available for... Mehr ...

Verfasser: Kruyt, J.G.
Dutilh, M.W.F.
Dokumenttyp: Artikel
Erscheinungsdatum: 2012
Verlag/Hrsg.: Bureau of the WAT
Schlagwörter: large electronic dutch text corpus / corpus design / text classification / topic / publication medium / linguistic annotation / on-line access via internet / corpus users / groot elektronisch nederlands tekstcorpus / corpussamenstelling / tekstclassificatie / onderwerpsdomein / publicatiemedium / linguistische annotatie / on-line toegang via internet / corpusgebruikers
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-26755005
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://lexikos.journals.ac.za/pub/article/view/982

The use of text corpora has increased considerably in the past few years, not only in the field of lexicography but also in computational linguistics and language technology. Consequently, corpus data and expertise developed by lexicographical institutions have gained a broader scope of application. In the European context this has led to a revised view of corpus design. In line with these developments, the Institute for Dutch Lexicology (INL) has since 1994 been providing external access to steadily improving corpora via Internet. In August 1996, the 38 Million Words Corpus was available for consultation by the international research community. The present paper reports on the characteristics of this corpus (design, text classification, linguistic annotation) and on its use, both in dictionary projects and in linguistic research. In spite of limitations with respect to corpus design, the INL corpora accessible via Internet have proved to meet external needs. By providing these facilities, the INL has acquired a much broader experience in corpus-building than before, which is essential for new, internal dictionary projects. Giving external access to corpus data which was developed primarily for internal purposes, may be profitable for all parties involved. ; Een tekstcorpus Nederlands (38 miljoen woorden) en de gebruikers ervan Het gebruik van tekstcorpora is de laatste jaren aanzienlijk toegenomen, niet alleen op het gebied van de lexicografie maar ook in de computationele linguïstiek en de taaltechnologie. Ten gevolge daarvan kregen de corpusdata en de expertise opgebouwd door lexicografische instellingen een breder toepassingsdomein. Op Europees niveau leidde dit tot een herziene visie op corpussamenstelling. In overeenstemming met deze ontwikkelingen, geeft het Instituut voor Nederlandse Lexicologie (INL) sinds 1994 externe toegang via Internet tot steeds beter wordende corpora. In augustus 1996 was het 38 Miljoen Woorden Corpus gereed voor consultatie door het internationale onderzoeksveld. Dit artikel ...