We consider two separate Markov processes which describe the mortality and disease lifetables used in PMSLT modeling [6, 7].
The mortality lifetable
The mortality lifetable of a population \(P\) measures metrics such as the number of deaths, life years and life expectancy, capturing snapshots at discrete timesteps (e.g. years). Let \(A_{t}\) and \(D_{t}\) denote the number of people alive and dead at the end of the \(t\)-th timestep. We assume \(A_{0} = N\) and \(D_{0} = 0\), where \(N\) is the total number of people that are observed initially, and for each \(t\) we have a mortality rate \(m_{t}\). Then:
$$\begin{aligned} A_{t} & = A_{t - 1} e^{{{ } - m_{t} }} , \\ D_{t} & = D_{t - 1} + A_{t - 1} \left( {1 - e^{{{ } - m_{t} }} } \right) \\ \Rightarrow D_{t} & = N - A_{t} \\ \end{aligned}$$
In the mortality lifetable disaggregation problem, \(P\) consists of \(n\) underlying sub-populations \(P_{1} , \ldots , P_{n}\), where we assume that members of \(P\) remain in their respective sub-populations for life. The initial population for each \(P_{1} , \ldots , P_{n}\) are known, and the initial mortality rates are known (either given, or solvable using mortality rate ratios between strata). However, whilst \(m_{t}\), the total population mortality rate by future time step, is given, the stratum-specific mortality rates over time are not given. This situation is not uncommon, e.g. we may know the starting disaggregation of a population by socioeconomic strata, and the rate ratios for mortality or disease incidence comparing strata, but not the exact rates by strata over time. The goal is to use the aggregate populations mortality rates, the rate ratios comparing strata, and starting distribution of each sub-population (\(P_{k}\)) to solve the mortality lifetables for each \(P_{1} , \ldots , P_{n}\). The sub-populations lifetables must be consistent with that of the aggregate population: at each time timestep \(t\), the sum of the people in the alive (dead) compartment for each sub-population is equal to the number of people in the alive (dead) compartment for the aggregate population. i.e.,
$$\begin{aligned} A_{t} & = \mathop \sum \limits_{k = 1}^{n} A_{k,t} \\ D_{t} & = \mathop \sum \limits_{k = 1}^{n} D_{k,t} \\ \end{aligned}$$
where \(A_{k,t}\) and \(D_{k,t}\) denote the number of people in the alive and dead compartments respectively for sub-population \(k\) at time \(t.\) Thus, the disaggregation problem reduces to finding mortality rates \(m_{k,t}\) for each sup-population \(k\) at timestep \(t > 0\) such that:
$$A_{t - 1} e^{{{ } - m_{t} }} = \mathop \sum \limits_{k = 1}^{n} A_{k,t - 1} e^{{{ } - m_{k,t} }}$$
If we let \(P_{1}\) be the reference sub-population, then the mortality rate ratios r for timestep \(t\) are given as scalars \(r_{1,t}^{{{\text{mort}}}} , \ldots , r_{n,t}^{{{\text{mort}}}}\), where \(r_{1,t}^{{{\text{mort}}}} = 1\), such that \(m_{k,t} = r_{k,t}^{{{\text{mort}}}} m_{1,t}\) for each sub-population \(k\). By substituting each \(r_{k,t}^{{{\text{mort}}}} m_{1,t}\) into the consistency equations, we obtain a set of equations with a unique solution. By solving these, we are then able to obtain a unique set of sub-population mortality rates for the mortality lifetable disaggregation problem. A method for solving for these rates, and the proof of the uniqueness of the solution is in “Appendix 2: Disaggregation details”.
The mortality/morbidity lifetable
We now extend the mortality lifetable to a mortality/morbidity lifetable that also includes HALYs which incorporate the effects of morbidity. Let \(L_{t}\) denote population life years at \(t\), where
$$L_{t} = \frac{{A_{t} + A_{t - 1} }}{2}$$
Our HALY unit is a rescaling of the life-year using disability rates. Let \(w_{t}\) denote the prevalent years of life with disability (i.e. ‘YLDs’) from a burden of disease study at time \(t\), divided by the population in that strata. Then the formula for HALYs at time \(t\), denoted \(L_{t}^{*}\), is:
$$L_{t}^{*} = L_{t} \left( {1 - w_{t} } \right)$$
For the mortality/morbidity lifetable disaggregation problem, an extension of the disaggregation problem in the previous
section, we are given a mortality/morbidity lifetable for a population \(P\) which includes all parameters from the mortality lifetable, along with morbidity weights \(w_{t}\) and HALYs \(L_{t}^{*}\). As before, \(P\) consists of \(n\) sub-populations \(P_{1} , \ldots , P_{n}\), with their individual population counts, mortality rates, and YLDs given for the first time step. The goal is to use the aggregate lifetable to determine the mortality rates and morbidity rates (\(w_{t}\)) rates for each sub-population \(P_{k}\), and hence obtain the mortality/morbidity lifetables for each \(P_{1} , \ldots , P_{n}\). We have to ensure alive population counts for sub-populations and the aggregate population agree and also total HALYs for the sub-populations agree with the aggregate population HALYs. That is:
$$L_{t}^{*} = \mathop \sum \limits_{k = 1}^{n} L_{k,t}^{*}$$
where \(L_{k,t}^{*}\) denotes the HALYs for sub-population \(k\) at time \(t\). To satisfy the above equations, we must solve values \(w_{k,t}\) for each \(k\) and \(t\) such that:
$$L_{t} \left( {1 - w_{t} } \right) = \mathop \sum \limits_{k = 1}^{n} L_{k,t} \left( {1 - w_{k,t} } \right)$$
where \(L_{k,t}\) denotes life years for sub-population \(k\) at \(t\).
We can assume that the mortality lifetable disaggregation problem has been solved as a subproblem, since it can be independently solved using the method in “The mortality lifetable” section. Then, we have alive population values, such that \(A_{t} = \mathop \sum \nolimits_{k = 1}^{n} A_{k,t}\), which implies that \(L_{t} = \mathop \sum \nolimits_{k = 1}^{n} L_{k,t}\). Hence, we can simplify the HALY constraints to:
$$L_{t} w_{t} = \mathop \sum \limits_{k = 1}^{n} L_{k,t} w_{k,t}$$
To solve the problem, we assume morbidity (morb) for each \(t\) (although in all likelihood ratios vary by age and sex, but are assumed constant over t), which are \(r_{1,t}^{{{\text{morb}}}} , \ldots , r_{n,t}^{{{\text{morb}}}}\), and \(r_{1,t}^{{{\text{morb}}}} = 1\), such that \(w_{k,t} = r_{k,t}^{{{\text{morb}}}} w_{1,t}\) for each sub-population \(k\). After substituting each \(r_{k,t}^{{{\text{morb}}}} w_{1,t}\) into the HALY constraints and solving, we obtain:
$$w_{k,t} = r_{k,t}^{{{\text{morb}}}} \frac{{L_{t} w_{t} }}{{\mathop \sum \nolimits_{j = 1}^{n} r_{j,t}^{{{\text{morb}}}} L_{j,t} }}$$
Thus, we are able to use morbidity ratios to uniquely disaggregate the mortality/morbidity lifetable such that the HALYs in the sub-population lifetables are consistent with the aggregate population lifetable.
The disease lifetable
A PMSLT, described in detail in [6, 7], works through changes in disease incidence or case fatality rates, where each disease is assumed independent of other diseases. Similar to the all-cause mortality and morbidity lifetable (above), the issue here is in ensuring that each disease-specific subsidiary lifetable also returns the numbers and rates or the total population before it is disaggregated by heterogeneity (eg. SES).
We now consider an alternative type of lifetable: the disease lifetable. This lifetable consists of three compartments: a healthy compartment \(S\), diseased compartment \(C\), and dead compartment \(D\). At each timestep, members of the population in \(S\) transition to \(C\) according to the incidence rate, and from \(C\) to \(D\) according to the fatality rate. For some diseases, members can transition from \(C\) to \(S\) as per the remission rate, however, for simplicity, we do not consider this possibility for now.
Let \(S_{t}\), \(C_{t}\) and \(D_{t}\) denote the number of people in compartment \(S\), \(C\) and \(D\) respectively at the end of the \(t\)-th step. We will assume initially that \(D_{0} = 0\) and \(S_{0} + C_{0} = N\), where \(N\) is the total number of people initially observed. Let \(i_{t}\) and \(f_{t}\) denote the incidence and fatality rates respectively at \(t\). The equations for \(S_{t}\), \(C_{t}\) and \(D_{t}\) are given by the system:
$$\begin{aligned} S_{t} & = S_{t - 1} e^{{{ } - i_{t} }} , \\ C_{t} & = C_{t - 1} e^{{{ } - f_{t} }} + S_{t - 1} \left( {1 - e^{{{ } - i_{t} }} } \right), \\ D_{t} & = N - S_{t} - C_{t} \\ \end{aligned}$$
These equations are premised on a simplifying assumption that members of the population cannot die from the disease in the same timestep in which they contract the disease This assumption can, in practice, be mitigated through choosing an appropriately small timestep.
For the disease lifetable disaggregation problem, we are given the disease lifetable for an aggregate population \(P\) complete with incidence rates \(i_{t}\), fatality rates \(f_{t}\) and population counts \(S_{t}\), \(C_{t}\) and \(D_{t}\) at each timestep \(t\). We assume that \(P\) consists of \(n\) separate underlying sub-populations \(P_{1} , \ldots , P_{n}\), each with their own population counts, incidence rates and fatality rates. Let \(S_{k,t}\), \(C_{k,t}\) and \(D_{k,t}\) denote the number of people in the healthy, diseased and dead compartments respectively for sub-population \(k\) at time \(t\). We additionally assume for each sub-population we are given the initial disease prevalence, hence we can obtain \(S_{k,0}\) and \(C_{k,0}\). The objective of the problem is to determine both the incidence rates and fatality rates of each sub-population \(P_{k}\) and hence obtain the disease lifetables for each \(P_{1} , \ldots , P_{n}\). The criteria for consistency for this disaggregation at each time timestep \(t\) are given by:
$$\begin{aligned} S_{t} & = \mathop \sum \limits_{k = 1}^{n} S_{k,t} \\ C_{t} & = \mathop \sum \limits_{k = 1}^{n} C_{k,t} \\ \end{aligned}$$
i.e., we must choose incidence rates \(i_{k,t}\) and fatality rates \(f_{k,t}\) for each timestep \(t\) and sub-population \(k\) such that:
$$S_{t - 1} e^{{{ } - i_{t} }} = \mathop \sum \limits_{k = 1}^{n} S_{k,t - 1} e^{{{ } - i_{k,t} }}$$
and
$$C_{t - 1} e^{{{ } - f_{t} }} + S_{t - 1} \left( {1 - e^{{{ } - i_{t} }} } \right) = \mathop \sum \limits_{k = 1}^{n} \left[ {C_{k,t - 1} e^{{{ } - f_{k,t} }} + S_{k,t - 1} \left( {1 - e^{{{ } - i_{k,t} }} } \right) } \right]$$
We once again assume that we are given rate ratios for the sub-population rates at each timestep \(t\), specifically incidence rate ratios \(r_{1,t}^{{\text{i}}} , \ldots , r_{n,t}^{{\text{i}}}\) such that \(i_{k,t} = r_{k}^{{\text{i}}} w_{1,t}\) for each \(k\), and fatality rate ratios \(r_{1,t}^{{\text{f}}} , \ldots , r_{n,t}^{{\text{f}}}\) such that \(f_{k,t} = r_{k}^{{\text{f}}} w_{1,t}\) for each \(k\).
We can apply the method used in the mortality problem to obtain unique incidence rates \(i_{k,t}\) that satisfy the constraints for the healthy population. After obtaining the sub-population incidence rates, the consistency constraint for the diseased population simplifies to:
$$\begin{aligned} & C_{t - 1} e^{{{ } - f_{t} }} + S_{t - 1} - S_{t} = \mathop \sum \limits_{k = 1}^{n} \left[ {C_{k,t - 1} e^{{{ } - f_{k,t} }} + S_{k,t - 1} - S_{k,t} } \right] \\ & \Rightarrow C_{t - 1} e^{{{ } - f_{t} }} + \left( {S_{t - 1} - \mathop \sum \limits_{k = 1}^{n} S_{k,t - 1} } \right) - \left( {S_{t} - \mathop \sum \limits_{k = 1}^{n} S_{k,t} } \right) = \mathop \sum \limits_{k = 1}^{n} C_{k,t - 1} e^{{{ } - f_{k,t} }} \\ & \Rightarrow C_{t - 1} e^{{{ } - f_{t} }} = \mathop \sum \limits_{k = 1}^{n} C_{k,t - 1} e^{{{ } - f_{k,t} }} \\ \end{aligned}$$
Thus, by using two consecutive applications of the methods described in “Appendix 1: Abstracts from literature search”, first for the healthy compartment and incidence rates and then for the diseased compartment and fatality rates, we can use the rate ratios to obtain a consistent disaggregation of the disease lifetable.
The prototype code for the above methods is provided in a GitHub repository [13].