Improving the recognition of Dutch Gothic machine print, at four levels in the processing pipeline, in four days

Libraries and archives are struggling with optical character recognition (OCR) of old machine-print fonts such as Gothic or 'fraktur'. This font was used in many important historical printed collections such as administrative texts and the then (17th century) newly invented 'newspapers' with interesting and detailed reports on important developments and events. When applying current state of the art OCR tools or sending the scanned images to large well-known companies that provide OCR services, the returned results are still quite disappointing. Problems are observed at all levels in the proce... Mehr ...

Verfasser: Schomaker, Lambert
Ameryan, Mahya
Cuper, Mirjam
Dercksen, Koen
Guo, Jerry
Koert, Rutger van
Mendrik, Adriënne
Todorov, Konstantin
Wang, Xue
Dokumenttyp: report
Erscheinungsdatum: 2020
Schlagwörter: optical character recognition / historical printed collections / blackletter / ICT with industry
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-26689263
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://zenodo.org/record/4003740