DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text

In order to use medical text for research purposes, it is necessary to de-identify the text for legal and privacy reasons. We report on a pattern matching method to automatically de-identify medical text written in Dutch, which requires a low amount of effort to be hand tailored. First, a selection of Protected Health Information (PHI) categories is determined in cooperation with medical staff. Then, we devise a method for de-identifying all information in one of these PHI categories, that relies on lookup tables, decision rules and fuzzy string matching. Our de-identification method DEDUCE is... Mehr ...

Verfasser: Menger, V.
Scheepers, F.E.
van Wijk, L.M.
Spruit, M.
Dokumenttyp: Artikel
Erscheinungsdatum: 2018
Schlagwörter: De-identification / Dutch medical text / Pattern matching / Protected Health Information / Patient privacy / Taverne
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-29039058
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://dspace.library.uu.nl/handle/1874/369735

In order to use medical text for research purposes, it is necessary to de-identify the text for legal and privacy reasons. We report on a pattern matching method to automatically de-identify medical text written in Dutch, which requires a low amount of effort to be hand tailored. First, a selection of Protected Health Information (PHI) categories is determined in cooperation with medical staff. Then, we devise a method for de-identifying all information in one of these PHI categories, that relies on lookup tables, decision rules and fuzzy string matching. Our de-identification method DEDUCE is validated on a test corpus of 200 nursing notes and 200 treatment plans obtained from the University Medical Center Utrecht (UMCU) in the Netherlands, achieving a total micro-averaged precision of 0.814, a recall of 0.916 and a F1-score of 0.862. For person names, a recall of 0.964 was achieved, while no names of patients were missed.