Mapping chronic disease prevalence based on medication use and socio-demographic variables: an application of LASSO on administrative data sources in healthcare in the Netherlands

Verfasser:	Koen Füssenich Hendriek C. Boshuizen Markus M. J. Nielen Erik Buskens Talitha L. Feenstra
Dokumenttyp:	Artikel
Erscheinungsdatum:	2021
Reihe/Periodikum:	BMC Public Health, Vol 21, Iss 1, Pp 1-8 (2021)
Verlag/Hrsg.:	BMC
Schlagwörter:	Disease prevalence / Small area estimates / Machine learning / Public aspects of medicine / RA1-1270
Sprache:	Englisch
Permalink:	https://search.fid-benelux.de/Record/base-27192994
Datenquelle:	BASE; Originalkatalog
Powered By:	BASE
Link(s) :	https://doi.org/10.1186/s12889-021-10754-4

Abstract Background Policymakers generally lack sufficiently detailed health information to develop localized health policy plans. Chronic disease prevalence mapping is difficult as accurate direct sources are often lacking. Improvement is possible by adding extra information such as medication use and demographic information to identify disease. The aim of the current study was to obtain small geographic area prevalence estimates for four common chronic diseases by modelling based on medication use and socio-economic variables and next to investigate regional patterns of disease. Methods Administrative hospital records and general practitioner registry data were linked to medication use and socio-economic characteristics. The training set (n = 707,021) contained GP diagnosis and/or hospital admission diagnosis as the standard for disease prevalence. For the entire Dutch population (n = 16,777,888), all information except GP diagnosis and hospital admission was available. LASSO regression models for binary outcomes were used to select variables strongly associated with disease. Dutch municipality (non-)standardized prevalence estimates for stroke, CHD, COPD and diabetes were then based on averages of predicted probabilities for each individual inhabitant. Results Adding medication use data as a predictor substantially improved model performance. Estimates at the municipality level performed best for diabetes with a weighted percentage error (WPE) of 6.8%, and worst for COPD (WPE 14.5%)Disease prevalence showed clear regional patterns, also after standardization for age. Conclusion Adding medication use as an indicator of disease prevalence next to socio-economic variables substantially improved estimates at the municipality level. The resulting individual disease probabilities could be aggregated into any desired regional level and provide a useful tool to identify regional patterns and inform local policy.