Gado2: multilingual newspapers from the Netherlands Indies ...
This Handwritten Text Recognition (HTR) xml-page file dataset contains the ground truths used for the gado-gado named entity processing application, see: https://github.com/KBNLresearch/gado-gado. Optical Character Recognition (OCR) resulted in high Character Error Rates (CER) due to the inferior quality of many scans. In contrast, HTR led to CERs below 0.5 percent thus increasing the efficiency of the NER engine. All uploaded files are free of errors and fully tagged. This initial release will be extended much further in the coming weeks. ...
Verfasser: | |
---|---|
Dokumenttyp: | dataset |
Erscheinungsdatum: | 2021 |
Verlag/Hrsg.: |
Zenodo
|
Schlagwörter: | Indonesia / Javanese / Newspapers / Sundanese / Dutch / Netherlands Indies / Chinese / Japanese / Arabic / Historical sources / Named Entity Recognition / Named Entity Linking / Handwritten Text Recognition |
Sprache: | Niederländisch |
Permalink: | https://search.fid-benelux.de/Record/base-29164545 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | https://dx.doi.org/10.5281/zenodo.4882963 |