Semiautomatic translation of medicine usage data (in Dutch, free-text) from Lifelines COVID-19 questionnaires to ATC codes

The mapping of human-entered data to codified data formats that can be analysed is a common problem across medical research and healthcare. To identify risk and protective factors for SARS-CoV-2 susceptibility and COVID-19 severity, frequent questionnaires were sent out to participants of the Lifelines Cohort Study starting March 30, 2020. Because specific drugs were suspected COVID-19 risk factors, the questionnaires contained multiple-choice questions about commonly used drugs and open-ended questions to capture all other drugs used. To classify and evaluate the effects of those drugs and gr... Mehr ...

Verfasser: Kellmann, Alexander
Lanting, Pauline
Franke, L H
van Enckevort, Esther
Swertz, M A
Dokumenttyp: Artikel
Erscheinungsdatum: 2023
Reihe/Periodikum: Kellmann , A , Lanting , P , Franke , L H , van Enckevort , E & Swertz , M A 2023 , ' Semiautomatic translation of medicine usage data (in Dutch, free-text) from Lifelines COVID-19 questionnaires to ATC codes ' , Database-The journal of biological databases and curation , vol. 2023 . https://doi.org/10.1093/database/baad019
Schlagwörter: SORTA / Free-text answers / ATC codes / Dutch / Data Integration / Lifelines / COVID-19 questionnaire
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-27446246
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://hdl.handle.net/11370/6984cacd-fe47-4fd7-84ae-ba4eb81f978d

The mapping of human-entered data to codified data formats that can be analysed is a common problem across medical research and healthcare. To identify risk and protective factors for SARS-CoV-2 susceptibility and COVID-19 severity, frequent questionnaires were sent out to participants of the Lifelines Cohort Study starting March 30, 2020. Because specific drugs were suspected COVID-19 risk factors, the questionnaires contained multiple-choice questions about commonly used drugs and open-ended questions to capture all other drugs used. To classify and evaluate the effects of those drugs and group participants taking similar drugs, the free-text answers needed to be translated into standard Anatomical Therapeutic Chemical (ATC) codes. This translation includes handling misspelt drug names, brand names, comments, or multiple drugs listed in one line that would prevent a computer from finding these terms in a simple lookup table. In the past, translation of free-text responses to ATC codes was time-intensive manual labour for experts. To reduce the amount of manual curation required, we developed a method for the semi-automated recoding of the free-text questionnaire responses into ATC codes suitable for further analysis. For this purpose, we built an ontology containing the Dutch drug names linked to their respective ATC code(s). In addition, we designed a semi-automated process that builds upon the Molgenis method SORTA to map the responses to ATC codes. This method can be applied to support the encoding of free-text responses to facilitate the evaluation, categorisation and filtering of free-text responses. Our semi-automatic approach to coding of drugs using SORTA, turned out to be more than two times faster than current manual approaches to performing this activity.