Nautilus: An End-To-End METS/ALTO OCR Enhancement Pipeline

When a digital collection has been processed by OCR, the usability expectations of patrons and researchers are high. While the former expect full text search to return all instances of terms in historical collections correctly, the latter are more familiar with the impacts of OCR errors but would still like to apply big data analysis or machine-learning methods. All of these use cases depend on high quality textual transcriptions of the scans. This is why the National Library of Luxembourg (BnL) has developed a pipeline to improve OCR for existing digitised documents. Enhancing OCR in a digita... Mehr ...

Verfasser: Schneider, Pit
Maurer, Yves
Marschall, Ralph
Dokumenttyp: Artikel
Erscheinungsdatum: 2023
Verlag/Hrsg.: LIBER
the Association of European Research Libraries
Schlagwörter: OCR quality / OCR correction / Luxembourg historical newspapers / ground truth / METS/ALTO
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-29104822
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://liberquarterly.eu/article/view/13330