4.1. Health expenditure estimation methodology

This section provides details on the methodology to estimate the treatment costs for both modelled diseases, as well as for unrelated conditions. The costs of implementing interventions (which could include either medical or nonmedical components, or both, depending on the interventions) were estimated using a different methodology, described in Intervention costs.

The methodology explored in this section is designed to predict total healthcare costs for each patient, conditional on age, gender and disease status. The general cost formula is as follows:

(4.1)\[{Cost}_{i,total} = {Cost}_{i,residual} + {Cost}_{i,extra-main} + {Cost}_{i,extra-comorb} + {Cost}_{i,extra-death}\]

For a person, \(i\), the total cost, \({Cost}_{i,total}\), comprises both explicit and non-explicit costs. Explicit costs include costs caused by diseases explicitly included in the model and caused by ris factors in the model. Non-explicit costs incorporate the residual cost, \({Cost}_{i,residual}\), which captures costs unrelated to risk factors, for example, the costs of treating migraines or common colds. If an individual has only one microsimulation-defined disease, the total cost is equal to the sum of the predicted residual costs and the predicted extra costs of having the disease. If an individual has two diseases, the comorbidity cost component is also be added to this sum.

Individuals may also die over the observation period, in which case their total predicted costs also include death-related healthcare expenditures (over and above other costs). In cases where a person in the model develops more than two diseases, the costs of additional diseases will be taken into account as well.

For modelling purposes, it is assumed that the “main disease” is the most recently diagnosed disease, while “comorbidity” refers to a disease previously diagnosed. For example, if an individual has had diabetes for several years and was diagnosed with cancer this year, then the model would first estimate the extra cost of having a “main disease”- cancer (in the sample with comorbidities) and then estimate the extra cost of having a comorbidity-for example, having diabetes in the presence of cancer.

In principle, it is possible to predict total healthcare costs, \({Cost}_{i,total}\), which also includes out-of-pocket expenditure, for each person, \(i\), by estimating the parameters in the two-way interaction model (for both genders separately) as described in [Cortaredona and Ventelous, 2017 [13]]:

(4.2)\[ln({Cost}_{i}) = \alpha + \beta \cdot {age}_{i} + {\gamma}_{k} \cdot {D}_{i,k} + {\gamma}_{j} \cdot {D}_{i,j} + {\gamma}_{k,j} \cdot {D}_{i,k} \cdot {D}_{i,j} + {\epsilon}_{i}\]


  • \({Cost}_{i}\) is total healthcare cost defined as: “In a bottom-up design, units of health care are used on a patient level and are multiplied with a price for this unit. All individual health expenditures are then summed up to calculate total cost of the disease [Cortaredona and Ventelous, 2017 [13]]

  • \({age}_{i}\) = age

  • \({D}_{i,k}\) = 1 of individual \(i\) suffers from illness \(k\), and 0 otherwise. It therefore follows that \({D}_{i,k} \cdot {D}_{i,j}\) = 1 if \({D}_{i,k} = {D}_{i,j} = 1\)

  • \(\alpha\) = predicted healthcare cost for a person aged 18-39 without any diagnosed modelled disease

The practical problem with estimating (4.2) is that the sample size for several conditions in the Échantillon généraliste des bénéficiaires (EGB) dataset in France is too small for two-part model estimation with interactions. Predicting costs using (4.2) is even more problematic if we want to take into account the information on the length of time since diagnosis contained in the EGB dataset. Therefore, it was decided that the total healthcare costs in France (and for consistency in all other countries) will be predicted by separately estimating each component listed in (4.1), as described in the sections that follows.

(4.2) can be estimated using the sample of people with positive costs using multivariate gamma regression with a log link (see [Thi{\'{e}}baut, Barnay and Ventelou, 2013 [60]] for the choice on appropriate econometric specification for France).

For example, the total predicted healthcare cost for a person aged 55 with no modelled diseases and with positive costs would be equal to:

\[E(C|C>0) = exp(\hat\alpha + {\hat\beta}_{50-55})\]

For a person with diabetes of the same age, the total predicted cost in this sample would be equal to:

\[E(C| C>0, diabetes=1) = exp(\hat\alpha + {\hat\beta}_{50-55} + {\hat\gamma}_{diabetes})\]

For a person with both diabetes and cancer, the total cost in the sample of people with positive costs can be predicted as:

\[E(C|C>0, diabetes=1, cancer=1) = exp(\hat\alpha + {\hat\beta}_{50-55} + {\hat\gamma}_{diabetes} + {\hat\gamma}_{cancer} + {\hat\gamma}_{diabetes*cancer})\]

To make sure that these predicted costs are representative not just of the people with positive healthcare costs, but of all people with the diagnosed conditions, an adjustment is made by multiplying these costs by the probability of having positive healthcare costs. For example, the total predicted cost for a person with diabetes is:

\[E(C|diabetes=1) = P(C>0) \cdot E(C|C>0), diabetes=1)\]

The first part of this two-part model estimator can be estimated using logit regression:

\[P(C>0) = \Phi(a+b \cdot {AGE}_{cat})\]


  • \(\Phi\) is the cumulative standard logistic distribution function.