The Value of Preexisting Structures for Digital Access:Modelling the Resolutions of the Dutch States General

The Resolutions of the Dutch States General (1576–1796) is an archive covering over two centuries of decision making and consists of a heterogeneous series of handwritten and printed documents. The archive, which has recently been digitised, is a rich source for historical research. However, owing to the archive’s heterogeneity and dispersion of information, historians and other researchers find it hard to use the archive for their research. In this article, we describe how we deal with the challenges of structuring and connecting the information in this archive. We focus on identifying the ex... Mehr ...

Verfasser: Koolen, Marijn
Hoekstra, Rik
Oddens, Joris
Sluijter, Ronald
Van Koert, Rutger
Brouwer, Gijsjan
Brugman, Hennie
Dokumenttyp: Artikel
Erscheinungsdatum: 2023
Reihe/Periodikum: Koolen , M , Hoekstra , R , Oddens , J , Sluijter , R , Van Koert , R , Brouwer , G & Brugman , H 2023 , ' The Value of Preexisting Structures for Digital Access : Modelling the Resolutions of the Dutch States General ' , Journal of Computing and Cultural Heritage , vol. 16 , no. 1 , 1 , pp. 1-24 . https://doi.org/10.1145/3575864
Schlagwörter: text recognition / data modelling / Information extraction / digital history
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-28586948
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://pure.knaw.nl/portal/en/publications/9d47d1f3-7c82-4e65-a1ec-06210db44fc2

The Resolutions of the Dutch States General (1576–1796) is an archive covering over two centuries of decision making and consists of a heterogeneous series of handwritten and printed documents. The archive, which has recently been digitised, is a rich source for historical research. However, owing to the archive’s heterogeneity and dispersion of information, historians and other researchers find it hard to use the archive for their research. In this article, we describe how we deal with the challenges of structuring and connecting the information in this archive. We focus on identifying the existing structural elements, to turn the archive from a set of pages into a set of meeting dates and individual resolutions, with rich metadata for each resolution. To deal with the challenges of historical language change, spelling variation, and text recognition mistakes, we exploit the repetitive nature of the language of the resolutions and use fuzzy string searching to identify structural elements by the formulaic expressions that signal their boundaries. We also discuss and provide an analysis of the value of extracting different types of entities from the text and argue that the choice of which types of entities to focus on should be made based on how they support relevant research questions and methods. In the resolutions, we choose to prioritise person qualifications such as profession, legal status, or title, over person names. Qualifications allow users to select certain groups of people and to meaningfully combine with other layers of metadata, whereas person names lack contextual information to disambiguate them, making it unclear which and how many persons are referred to by selecting a specific person name. We show how our methodology results in a computational platform that allows users to explore and analyse the archive through many connected layers of metadata.