Appendix A: data description
The data set used for this document is a cross-sectional data set of the entire population of Urghuland, collected in year 2019. The data contains the following variables:
- demographic: age, gender and county of residence
- SES: income, education, marital status
- risk factors: BMI, smoking status
- chronic conditions: self-reported binary variable denoting presence/absence of: heart disease, hypertension, stroke, cancer, diabetes.
- health expenditures: total outpatient cost and outpatient cost funded by government
- codes: for each individual we include the sequence of all outpatient codes for the year. We only report the sequence and not the time intervals between encounters for simplicity. The sequence is coded as follows: x_y_y_z_ …_ z_x where x, … z, are code items for specific services (say, a GP visit, a lab test, an imaging study …)