4.5. Residual cost and related issues¶
We estimate age- and gender-specific average residual costs by restricting the sample to people who had no model- defined diagnosed diseases (but could have other diagnosed conditions, including chronic ones, or were otherwise healthy). Such people may or may not have had zero health expenditures.
A potential complication is that in microsimulations, people are assigned disease status based mostly on IHME epidemiological prevalence (with some additional calibrations as appropriate), which may well be different from the administrative dataset-based prevalence. On the other hand, in the administrative datasets that we have analysed, people with undiagnosed health conditions may be misclassified as not having the conditions that we are modelling, and therefore their costs will be inappropriately part of the residual costs. Such costs will also not be captured when estimating the extra cost of disease.
One potential way to deal with this is to assume that IHME-based prevalence reflects “correct” epidemiological prevalence (i.e. including both diagnosed and undiagnosed cases). Under this assumption, we could in theory adjust the predicted costs of disease by multiplying it by some factor based on the difference between “diagnosed” and “real” prevalence. If we find, for example, that for women aged 50-59, the prevalence of diabetes based on administrative data is 10%, while IHME-based prevalence is 12%, then we could multiply the estimated extra cost of disease in this group by 10/12=0.83, to make sure that such costs are representative of women who are both diagnosed and undiagnosed. Alternatively, we could assume that the extra disease cost equals zero for the proportion of people who are undiagnosed according to IHME data. Likewise, we could re-categorise our residual costs accordingly, which is likely to increase residual costs because a number of cases with zero expenditures will be reduced. Therefore the net effect on the total costs is ambiguous.
Nevertheless, it is not certain that IHME-estimated disease prevalence is necessarily superior to the administratively-derived one, as it relies on data of varying quality and methodological basis (e.g. it can be based on multiple sources of survey data, with additional assumptions to correct for self-reporting bias). Some analysis shows that for example in France, age and gender-specific prevalence of diabetes and of several cancers is higher in the administrative dataset than in the IHME dataset, which suggests this divergence may not be due to the inclusion of undiagnosed cases in IHME data. Although in some other cases, the prevalence was considerably higher in the IHME dataset, this was mostly true at the oldest and the youngest ages, where IHME estimation methodology might rely on too little data and on too many assumptions. In addition, at the oldest ages (generally older than 60-70) , where the prevalence rates diverge the most, the absolute numbers of affected people gets lower with each year of life, therefore the total impact on costs is reduced. Therefore, we prefer not to further adjust the extra disease/residual costs. Besides, since we are interested mostly in the “delta effect” of different interventions/scenario comparisons, the potential overestimation issue stemming from assigning the estimated costs to the undiagnosed cases is probably of minor significance.