Recognising and linking entities in old dutch text:A case study on voc notary records

The increased availability of digitised historical archives allows researchers to discover detailed information about people and companies from the past. However, the unconnected nature of these datasets presents a non-trivial challenge. In this paper, we present an approach and experiments to recognise person names in digitised notary records and link them to their job registration in the Dutch East India company’s records. Our approach shows that standard state-of-the-art language models have difficulties dealing with 18th century texts. However a small amount of domain adaption can improve... Mehr ...

Verfasser: Hendriks, Barry
Groth, Paul
van Erp, Marieke
Dokumenttyp: contributionToPeriodical
Erscheinungsdatum: 2021
Verlag/Hrsg.: CEUR Workshop Proceedings
Schlagwörter: Domain adaptation / Maritime history / Named entity recognition
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-28587121
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://pure.knaw.nl/portal/en/publications/ec65c294-464f-4558-9e11-0d4f06ca6a07

The increased availability of digitised historical archives allows researchers to discover detailed information about people and companies from the past. However, the unconnected nature of these datasets presents a non-trivial challenge. In this paper, we present an approach and experiments to recognise person names in digitised notary records and link them to their job registration in the Dutch East India company’s records. Our approach shows that standard state-of-the-art language models have difficulties dealing with 18th century texts. However a small amount of domain adaption can improve the connection of information on sailors from different archives.