Health expenditure estimation methodology 
=========================================

This section provides details on the methodology to estimate the treatment costs for both modelled diseases, as well as for unrelated conditions. The costs of 
implementing interventions (which could include either medical or nonmedical components, or both, depending on the interventions) were estimated using a different 
methodology, described in :ref:`intervention_costs`. 

The methodology explored in this section is designed to predict total healthcare costs 
for each patient, conditional on age, gender and disease status. The general cost formula is as follows:


.. math:: {Cost}_{i,total} = {Cost}_{i,residual} + {Cost}_{i,extra-main} + {Cost}_{i,extra-comorb} + {Cost}_{i,extra-death}
   :label: hccost_main

For a person, :math:`i`, the total cost, :math:`{Cost}_{i,total}`, comprises both explicit and non-explicit costs. Explicit costs
include costs caused by diseases explicitly included in the model and caused by ris factors in the model. Non-explicit costs incorporate the residual cost, :math:`{Cost}_{i,residual}`,
which captures costs unrelated to risk factors, for example, the costs of treating migraines or common colds. If an individual has only one microsimulation-defined disease, 
the total cost is equal to the sum of the predicted residual costs and the predicted extra costs of having the disease. If an individual has two diseases, the comorbidity 
cost component is also be added to this sum.

Individuals may also die over the observation period, in which case their total predicted costs also include death-related 
healthcare expenditures (over and above other costs). In cases where a person in the model develops more 
than two diseases, the costs of additional diseases will be taken into account as well.  

For modelling purposes, it is assumed that the “main disease” is the most recently diagnosed disease, 
while “comorbidity” refers to a disease previously diagnosed. For example, if an individual has had diabetes 
for several years and was diagnosed with cancer this year, then the model would first estimate the extra cost of having a 
“main disease”- cancer (in the sample with comorbidities) and then estimate the extra cost of having a comorbidity-for example, having diabetes 
in the presence of cancer. 

In principle, it is possible to predict total healthcare costs, :math:`{Cost}_{i,total}`, 
which also includes out-of-pocket expenditure, for each person, :math:`i`, by estimating the parameters in the two-way interaction model
(for both genders separately) as described in :cite:p:`Cortaredona2017`:

.. math:: 
   :label: hccost_total

   ln({Cost}_{i}) = \alpha + \beta \cdot {age}_{i} + {\gamma}_{k} \cdot {D}_{i,k} + {\gamma}_{j} \cdot {D}_{i,j} + {\gamma}_{k,j} \cdot {D}_{i,k} \cdot {D}_{i,j}  + {\epsilon}_{i}

Where: 


 * :math:`{Cost}_{i}` is total healthcare cost defined as: “In a bottom-up design, units of health care are used on a patient level and are multiplied with a price for this unit. All individual health expenditures are then summed up to calculate total cost of the disease :cite:p:`Cortaredona2017`”
 * :math:`{age}_{i}` = age
 * :math:`{D}_{i,k}` = 1 of individual :math:`i` suffers from illness :math:`k`, and 0 otherwise. It therefore follows that :math:`{D}_{i,k} \cdot {D}_{i,j}` = 1 if :math:`{D}_{i,k} = {D}_{i,j} = 1` 
 * :math:`\alpha`  = predicted healthcare cost for a person aged 18-39 without any diagnosed modelled disease 


The practical problem with estimating :eq:`hccost_total` is that the sample size for several conditions in the 
Échantillon généraliste des bénéficiaires (EGB) dataset in France is too small for two-part model 
estimation with interactions. Predicting costs using :eq:`hccost_total` is even more problematic if we want 
to take into account the information on the length of time since diagnosis contained in the EGB dataset. 
Therefore, it was decided that the total healthcare costs in France (and for consistency in all other 
countries) will be predicted by separately estimating each component listed in :eq:`hccost_main`, as 
described in the sections that follows. 

:eq:`hccost_total` can be estimated using the sample of people with positive costs using multivariate gamma regression
with a log link (see :cite:p:`Thiebaut2013` for the choice on appropriate econometric 
specification for France). 

For example, the total predicted healthcare cost for a person aged 55 with no modelled diseases and 
with positive costs would be equal to:

.. math:: 

   E(C|C>0) = exp(\hat\alpha + {\hat\beta}_{50-55})
    

For a person with diabetes of the same age, the total predicted cost in this sample would be equal to:

.. math:: 
 E(C| C>0, diabetes=1) = exp(\hat\alpha + {\hat\beta}_{50-55} + {\hat\gamma}_{diabetes}) 
 

For a person with both diabetes and cancer, the total cost in the sample of people with positive costs can be predicted as:

.. math:: 
 E(C|C>0, diabetes=1, cancer=1) = exp(\hat\alpha + {\hat\beta}_{50-55} + {\hat\gamma}_{diabetes} + {\hat\gamma}_{cancer} + {\hat\gamma}_{diabetes*cancer})

To make sure that these predicted costs are representative not just of the people with positive 
healthcare costs, but of all people with the diagnosed conditions, an adjustment is made 
by multiplying these costs by the probability of having positive healthcare costs. 
For example, the total predicted cost for a person with diabetes is:

.. math:: 
   E(C|diabetes=1) = P(C>0) \cdot E(C|C>0), diabetes=1)

The first part of this two-part model estimator can be estimated using logit regression:

.. math:: 
   P(C>0) = \Phi(a+b \cdot {AGE}_{cat})

Where: 

 * :math:`\Phi` is the cumulative standard logistic distribution function.