Data-Driven Syllabification for Middle Dutch
The task of automatically separating Middle Dutch words into syllables is a challenging one. A first method was presented by Bouma and Hermans (2012), who combined a rule-based finite-state component with data-driven error correction. Achieving an average word accuracy of 96.5%, their system surely is a satisfactory one, although it leaves room for improvement. Generally speaking, rule-based methods are less attractive for dealing with a medieval language like Middle Dutch, where not only each dialect has its own spelling preferences, but where there is also much idiosyncratic variation among... Mehr ...
Verfasser: | |
---|---|
Dokumenttyp: | Artikel |
Erscheinungsdatum: | 2019 |
Reihe/Periodikum: | Digital Medievalist, Vol 12, Iss 1 (2019) |
Verlag/Hrsg.: |
Open Library of Humanities
|
Schlagwörter: | automatic syllabification / data-driven methods / recurrent neural network / middle dutch / orthographic variation / Medieval history / D111-203 |
Sprache: | Englisch |
Permalink: | https://search.fid-benelux.de/Record/base-28987147 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | https://doi.org/10.16995/dm.83 |
The task of automatically separating Middle Dutch words into syllables is a challenging one. A first method was presented by Bouma and Hermans (2012), who combined a rule-based finite-state component with data-driven error correction. Achieving an average word accuracy of 96.5%, their system surely is a satisfactory one, although it leaves room for improvement. Generally speaking, rule-based methods are less attractive for dealing with a medieval language like Middle Dutch, where not only each dialect has its own spelling preferences, but where there is also much idiosyncratic variation among scribes. This paper presents a different method for the task of automatically syllabifying Middle Dutch words, which does not rely on a set of pre-defined linguistic information. Using a Recurrent Neural Network (RNN) with Long-Short-Term Memory cells (LSTM), we obtain a system which outperforms the rule-based method both in robustness and in effort.