A method for reclassifying cause of death in cases categorized as “event of undetermined intent”

Background We present a method for reclassifying external causes of death categorized as “event of undetermined intent” (EUIs) into non-transport accidents, suicides, or homicides. In nations like Russia and the UK the absolute number of EUIs is large, the EUI death rate is high, or EUIs comprise a non-trivial proportion of all deaths due to external causes. Overuse of this category may result in (1) substantially underestimating the mortality rate of deaths due to specific external causes and (2) threats to the validity of studies of the patterns and causes of external deaths and of evaluations of the impact of interventions meant to reduce them. Methods We employ available characteristics about the deceased and the event to estimate the most likely cause of death using multinomial logistic regression. We use the set of known non-transport accidents, suicides, and homicides to calculate an mlogit-based linear score and an estimated classification probability (ECP). This ECP is applied to EUIs, with varying levels of minimal classification probability. We also present an optional second step that employs a population-level adjustment to reclassify deaths that remain undetermined (the proportion of which varies based on the minimal classification probability). We illustrate our method by applying it to Russia. Between 2000 and 2011, 521,000 Russian deaths (15 % percent of all deaths from external causes) were categorized as EUIs. We used data from anonymized micro-data on the ~3 million deaths from external causes. Our reclassification model used 10 decedent and event characteristics from the computerized death records. Results Results show that during this period about 14 % of non-transport accidents, 13 % of suicides, and 33 % of homicides were officially categorized as EUIs. Our findings also suggest that 2011 levels of non-transport accidents and suicides would have been about 24 % higher and of homicide about 82 % higher than that reported by official vital statistics data. Conclusions Overuse of the external cause of death classification “event of undetermined intent” may indicate questionable quality of mortality data on external causes of death. This can have wide-ranging implications for families, medical professionals, the justice system, researchers, and policymakers. With our classification probability set as equal to or higher than 0.75, we were able to reclassify about two-thirds of EUI deaths in our sample. Our optional additional step allowed us to redistribute the remaining unclassified EUIs. Our method can be applied to data from any nation or sub-national population in which the EUI category is employed.


Background
In this paper we present a method for reclassifying external causes of death categorized as "event of undetermined intent" (EUIs). As we show in our study, the probability of a transport accident death being classified as an EUI is very low, thus EUIs caused by external injuries are necessarily due to non-transport accidents, suicides, or homicides. In theory, not enough information exists on EUIs for medical examiners to determine cause of death, though in some cases-especially with homicides and suicides-this category may be used purposely to register the death in this ill-defined category instead of due to a definite or likely violent cause [1][2][3][4][5][6][7][8][9]. In many industrialized nations use of the EUI category is rare. In some nations, however, the raw number of EUIs is large, the EUI death rate is high, or EUIs comprise a non-trivial proportion of all deaths due to external causes.
Overuse of the EUI category results in meaningful limitations. First, the mortality rate due to non-transport accidents, suicides, or homicides may be substantially underestimated if EUIs are ignored. This is especially problematic if the EUI category is purposely employed to artificially under-enumerate homicide or suicide deaths. As such, use of the EUI category may be considered a proxy for the quality of mortality data on external causes of death [10,11]. Second, at both the individual and population levels, overuse of the EUI category threatens the validity of studies of the patterns, causes, and consequences of non-transport accidents, suicides, and homicides, and of evaluations of the impact of interventions meant to reduce these types of mortality.
We propose a two-stage method for reclassifying externally caused EUIs as non-transport accidents, suicides, or homicides. After the first stage, a sizeable proportion of EUIs may remain unclassified when we set a higher level of reliability for reclassification. Thus, we add a second optional stage in which we show how reclassification of the entire set of EUI deaths may be reached conditional upon an additional assumption. We illustrate our method by applying it to data on nearly 3 million deaths due to external causes in Russia, a nation with generally reliable mortality data, high mortality from external causes, and a large number of deaths due to and a high rate of EUIs.
Substantively, reclassification of EUIs tends to elevate mortality from homicides and non-transport accidents to a greater extent than mortality from suicides. If these estimates are valid, then this changes our view of Russian rates of external causes of death, especially of important social barometers like homicide and suicide rates. Methodologically, our proposed method can be applied to other nations, allowing for a better understanding of (1) estimates of specific external causes of death, (2) the impact of the use of the EUI category on true rates of death due to nontransport accidents, suicide, and homicide, and (3) the impact on these causes of death of social, cultural, and economic factors and of public policy.

Use of the EUI category in Russia
Between 2000 and 2011, 15 % percent of all deaths from external causes in Russia were categorized as events of undetermined intent. Table 1 shows that Russia is the dubious leader on this indicator among several select industrialized nations. Other industrialized nations with a meaningful proportion of all deaths from external causes placed in this category include the UK (12 %), Poland (10 %), and Sweden (8 %). While the percentage difference between Russia and the UK seems relatively minor, the Russian agestandardized death rate (SDR) for this category is 8.5 times higher than in the UK and 4.7 times higher than in Poland, the nation with the second highest SDR for this category. Therefore, the Russian problem with EUIs is not only the high proportion of all external causes of death placed in this category but the very large number of deaths. Between 2000 and 2011 there were 521,000 EUI deaths, or more than 43,000 deaths annually. This compares to 541 thousand deaths from suicide and 380 thousand deaths from homicide during this period. If these EUIs were classified correctly it likely would substantially increase Russian rates of non-transport accidents, suicide, and homicide. Figure 1 shows the Russian SDR due to non-transport accidents, suicides, homicides, and EUIs since 1970. While EUIs generally trend with the other external causes of death, relative to these other causes EUIs (1) rose disproportionately following the collapse of the Soviet Union and (2) have not declined as quickly since the early 2000s. It is important to note that the similarity in trends across many causes of death in Russia, even the change occurring around the collapse of the Soviet Union, is only weakly related to coding practices. Instead, it is mainly explained by the abrupt and painful social, political, and economic changes. This includes the major role played by alcohol, as can be seen with the initiation of Gorbachev's anti-alcohol campaign in 1985, its weakening in 1988-91, termination in 1991, and subsequent fluctuations in consumption [12][13][14].
Our approach to reclassifying death events due to undetermined intent Our proposed method can be considered a method for imputing missing data. Such methods are often used in demography on census and other population data. Our general approach to reclassifying these deaths is to use other available characteristics about the deceased (e.g., age, sex) and the event (e.g., type of injury, location of death) to estimate the most likely cause of death: non-transport accident, suicide, or homicide. Our approach is based on multinomial logistic regression, which allows one to use these characteristics as explanatory variables to estimate the probability that a case belongs to one of these three causes.  1970  1972  1974  1976  1978  1980  1982  1984  1986  1988  1990  1992  1994  1996  1998  2000  2002  2004  2006  2008  2010 Year Standardized death rate per 100,000 Injury deaths of undetermined intent Non-transport accidents Suicides Homicides Fig. 1 Russian trends in standardized death rates per 100,000 residents for non-transport accidents, suicides, homicides, and external deaths due to events of undetermined intent In our case, we estimated the classification probability using the characteristics of known non-transport accidents, suicides, and homicides as our training set. We then applied this to the target set: events of undetermined intent. For each death we calculated the most probable category based on its constellation of characteristics. Our estimated classification probabilities (ECP) varied between 0.334 and 0.999, however, and it would make little sense to accept a classification into one of the categories when the ECP is low (e.g., < 0.5). The higher the level of ECP the greater the agreement between deaths from predicted and actual causes of death, but also the higher the number of EUI deaths that cannot be reclassified using the prediction model. At every level of ECP, more EUI deaths were reclassified as homicides and non-transport accidents relative to EUI deaths that were reclassified as suicides. Once we set a minimum limit of ECP to 0.75, it was possible to reclassify about two-thirds of the EUI deaths. Further, if we assume that the probabilities of misclassification of causes of death for the EUI events are the same as the corresponding probabilities for deaths with known causes, it was possible to add an additional optional step and reclassify the entire set of EUI events into one of the three known causes.

Data
Our analyses were based on anonymous micro-data on all deaths from external causes that occurred in Russia between January 1, 2000, and December 31, 2011. These included 1.481 million deaths due to non-transport accidents (ICD-10 codes W00-X59), 541 thousand deaths due to suicide (X60-X84), 379 thousand deaths due to homicide (X85-Y05, Y08, Y09), and 512 thousand deaths due to EUI (Y10-Y34). We excluded from our analysis deaths due to transport accidents (which have a low probability of being classified as an event of undetermined intent; see discussion below in the Sensitivity Analyses section) and a very small number of deaths due to "Neglect and abandonment, and other maltreatment syndromes" (Y06-Y07). Our total number of cases was about 2.913 million. Each computerized death record includes the following information. (1) Month and year of death registration. (2) A code for the region (analogous to province or state) in which the death was registered, which is usually (but not always) the same as the region of permanent residence of the deceased. (3) Sex. (4) Date of death. (5) Date of birth. (6) Age at death in completed years. (7) Two ICD-10 codes for cause of death. The first ICD code classifies cause according to external cause (e.g., accidental fall or homicide) and the second code denotes the anatomic character of injury (e.g., skull fracture or open wound of thorax). (8) Two aggregated cause of death codes from the abridged Russian cause of death nomenclature corresponding to the ICD-10 codes. We note, however, that our study is based on micro-data from death records, and in these records causes of death are coded by the original ICD-10 items. Thus, we depend on the original ICD-10 coding not the aggregated causes of death used by the Russian statistical agency. (9) Place of death: hospital, outside of a hospital, unknown. (10) The person who issued the death certificate: physician, feldsher (this is a medical worker of an intermediate level between a nurse and a physician), pathologist, or forensic expert. (11) A yes/no indicator of if the deceased was in a state of alcoholic intoxication at the time of death. (12) And a yes/no indicator of if the identity of the deceased was known.

Methods
The multinomial logistic model Our indirect statistical method for reclassification of EUIs is based on the use of multinomial logistic (mlogit) regression. Beginning with the set of all deaths from the three known causes-nontransport accident, suicide, and homicide-as our training set, we calculated an mlogit-based linear score and a predictor function equal to the estimated classification probability (ECP) that the case in question belongs to one of these three categories. Presuming (for simplicity) that all explanatory variables are dichotomous variables, the multinomial regression model can be expressed as In this equation, causes of death (i.e., outcomes) are numbered i = 1, 2, 3. x ik n are values of independent dichotomous variables for the case n. Index k runs across independent variables. B ik are the respective regression coefficients. One of the three outcomes (say i = 3) is considered as a base outcome with B 3k = 0. Other regression coefficients are estimated by the mlogit procedure according to the maximum likelihood. For every fixed n, values of the sum X k B i x n ik constitute the corresponding estimated linear scores, and the three values of the prediction function Pr(n, cause = i) are the estimated probabilities of the three causes of death, with their total equal to 1 (these are the ECP probabilities). As the number of deaths varies substantially across the three causes, we use weights to eliminate this difference so that the estimation procedure does not give preference according to relative sizes. As a sensitivity check we assessed two regression models with and without weights and compared their results.
To evaluate robustness of regression outcomes and the impacts of errors on the redistribution of the EUIs, we used bootstrapping on the training set to estimate the influence of errors in the regression coefficients B i on the final result. We generated 250 vectors of coefficientsB i using the formulaB i ¼ B i þ SE i ⋅γ, where SE i is the standard error of the regression coefficient B i , and γ is a random variable that has a standard normal distribution. We applied each vector of coefficientsB i for reclassifications of EUIs and examined variation in the results.
Preliminary analyses indicated substantial differences in the results for men and women, thus we conducted separate analyses for each.
Independent variables While a set of independent variables must be informative enough to successfully reclassify the EUI cases into the three causes of death, increasing the number of variables increases the risk of singularities in the Hessian matrix. Therefore, we constructed a variable list such that the Hessian matrix would be non-singular and each variable would be significant at p < 0.01 for at least one sex and at least one value of the dependent variable (i.e., cause of death). After a number of experiments, we generated the following list of ten independent variables. Thus, the final number of geographic regions was nine. 7. Urban/rural residence: A dichotomous variable defining whether the death occurred in an urban or rural area. 8. Type of injury: While the list of ICD-10 codes for these injuries includes 195 categories, the Russian national classification contains only 10 aggregate categories. We retained the Russian national classification but added nine additional categories for a total of 19. Table 2 contains a list of our categories, together with the corresponding ICD-10 codes. 9. Presence of alcoholic intoxication at death: A dichotomous variable coded 1 if alcohol intoxication at time of death was acknowledged on the death certificate. 10.Specific location of death: Based on ICD-10 rules [16], the eight places of death were home, residential institution, school or other institution and public administrative area, sports and athletics area, street and highway, trade and service area, other specified places, unspecified place.
Distributions of cases by independent variables is presented in Appendix C. The total number of possible combinations of the independent variables is about 1.120 million. Obviously a majority of them is not provided in the dataset. This set of ten independent variables appeared to be optimal. Adding other explanatory variables either did not reduce the prediction error or led to a singular Hessian matrix.
Handling missing values The input data did not contain missing values. In our mlogit model, the dependent variable (external cause of death) takes three well-defined values: non-transport accident, homicide, and suicide. Explanatory variables may have ill-defined values. For example, age at death may be "unknown" or the anatomic character of the injury may be "unspecified." However, empirical analysis shows that these ill-defined values provide important information for predicting cause of death. In such cases, therefore, we treated these "unknown" observations as specific values (i.e., unknown) rather than as missing values (coded "." in statistical packages) and rather than imputing their values.
Computations The mlogit analyses were conducted separately for men and women, though the variable list was the same for both (see the mlogit outputs in the Appendix A). To impute the missing cause of death, we applied the estimated linear scores and corresponding predictor functions to the training set of death records with known causes.
For each case, we estimated classification probabilities of being classified as each of the three death categoriesnon-transport accident, suicide, homicide-and assigned the case to the cause of death corresponding to the highest probability.
Assessing the multinomial logistic model on welldefined cases Results of the regression based reclassification on the set of deaths with known causes (i.e., nontransport accident, suicide, or homicide) were presented as the distribution matrix D = ‖d ij ‖, with i and j denoting predicted and actual causes of death, respectively (i,j = 1,2,3). The nine elements d ij show a two-dimensional distribution of death cases by predicted and actual causes. D j A and D i P are the marginal one-dimensional distributions by actual and predicted causes of death, respectively Relative error in prediction of the total number of actual cause of events is equal to (D i P − D i A )/D i A , where (i = 1,2,3). The smaller these errors, the closer the model fit of the actual population-level mortality distribution is by cause.
The matrix D was obtained from death records by counting death cases with any of the three estimated ECPs that were greater than or equal to a specific lower limit denoted as ECP 0 . The limit ECP 0 can be chosen as any value between 0 and 1, with 0 corresponding to full flexibility and 1 to absolute constraint. For every case of death n, the candidate cause of death corresponds to the maximum of the three ECPs. However, the final assignment to the respective cause of death depends on the maximum ECP value, such that ECP ≥ ECP 0 . The matrix D corresponding to a specific value of ECP 0 was denoted . The relative errors of prediction diminish as the lower limit of ECP 0 increases. A simple transition from the absolute data d ECP 0 ij to a relative distribution ij permits one to compare the latter distributions with respect to the values of the ECP 0 limits.
Constructing the cause-of-death distributions for events of undetermined intent To reclassify the EUIs, we apply the regression coefficients provided by the multinomial regression model on deaths with known causes. We denote the total numbers of EUIs classified according to the three causes as U i , i = 1, 2, 3. First, we assess U i for men and women without any restriction on the level of the prediction probabilities (i.e., ECP 0 = 0). Effectively, in this case the choice of cause i is based on the maximal value of ECP without regard to whether its absolute value was high or low. Then, we produce a number of other variants of U ECP 0 i corresponding to ECP values that are constrained to be equal to or higher than ECP 0 . For our purposes, we used ECP 0 values ranging from 0.5 to 0.9.
In our case it was clear that when constraints on ECP values are flexible (e.g., no constraint at all or ECP ≥ 0.5), causes of death can be predicted for all or nearly all EUI cases. Under such conditions, though, a substantial proportion of these predictions could be inaccurate. With stricter constraints on the ECP value (e.g., ECP ≥ 0.8 or ECP ≥ 0.9), however, a relatively high proportion of EUIs can be predicted correctly, but for a substantial proportion of them prediction would be impossible because the maximal (with respect to cause of death i) ECP values would not be high enough to fulfill the constraint. The importance of this inevitable balance depends on the quality of diagnostic information contained by the set of independent variables.
Using the results of this reclassification of the set of EUIs we can re-estimate the numbers of deaths and corresponding death rates for non-transport accidents, suicides, and homicides. If we predetermine a higher ECP limit, then some proportion of EUIs remain unclassified. The  T00-T14, T66, T67, T70,  T72-T74, T76-T98 adjusted number of events belonging to a certain cause of death is the sum of the number of events from this cause among all events with known causes and the number of EUIs reclassified as deaths from the same cause. It may be that when setting a reasonably high ECP 0 leaves a relatively high number of cases for which cause of death cannot be predicted at the micro-level by the regression model. However, as an optional second stage we propose a simple procedure for a population-level reclassification of all EUIs based on an additional explicit assumption. To do this we return to the classification of cases with known causes of death. The proportion of cases classified by the model as cause i actually caused by cause j is equal to Therefore, the proportions P ij can be considered estimated probabilities for cases classified by the model as cause i actually caused by cause j. If one assumes that the probabilities of misclassification of causes of death for the EUI events are the same as the corresponding probabilities for deaths with known causes, then the matrix P T helps to estimate the population-level distribution of EUIs by causes as U Adj j ¼ X i P ij ⋅U j . Again, while this population level redistribution can be of substantial utility, it is an optional step that is not a necessary part of our main redistribution procedure.

Results
Within the framework of the bootstrap test, we carried out 250 random simulations for each case. In 99.2 % of the cases the predicted cause was the same as the predicted cause based on the original regression coefficients. For males, if the estimated classification probability was equal to or greater than 0.75 then the predicted cause was always the same as the prediction based on the original coefficients. For females this threshold was ECP ≥ 0.77. These tests provided confidence that the identified relationships were not a result of chance. Table 3 contains the distribution of events by actual and predicted kind of event in the entire dataset and for cases with no lower limit on ECP and with ECP lower limits of 0.5, 0.6, 0.7, 0.75, 0.8, 0.85, and 0.9. In the training set of well-defined death events, the weighted model (for which results are shown in Table 3) correctly predicted the actual causes in 84.5 % (85 % for males, 82 % for females) of cases. It correctly classified 82 % of non-transport accidents, 87 % of suicides, and 92 % of homicides. The unweighted model (not shown in table) correctly classified 86 % of all cases (87 % for males, 85 % for females), including 90 % of nontransport accidents, 85 % of suicides, and 76 % of homicides. So, our choice of the weighted model was justified by the poor performance of the unweighted model on homicide cases. The table shows that the model predicted actual homicides very well. However, the model tended also to over-predict homicide such that when the predicted cause was homicide the actual cause was sometimes different. About 8 % of all cases for males and for females were classified as homicides but were in fact non-transport accidents. Table 4 shows that additional requirements to the minimal ECP level improved this situation, though the problem remained. Indeed, the excess in predicted homicides fell more slowly than the proportion of events of undetermined intent that can be classified. Further investigation of the micro-level data revealed the reason for this phenomenon. It appears that there are nearly homogeneous (in light of the model independent variables) groups of cases that cannot be separated but that contain deaths with different causes. For example, there is a subset of 47 thousand male deaths with registered intracranial injury. The true distribution of events by cause for these cases is 49 % non-transport accidents and 51 % homicides. The problem is that for each case from the first sub-group it is possible to find a case from the second sub-group that looks similarly in light of all other independent variables. However, the weighted and unweighted models classified nearly all these cases as homicides or as non-transport accidents, respectively. Although the weighted model tends to over-predict homicides, this tendency weakens with higher minimum limits on ECP. This implies that the EUIs for a large part of misclassified homicides are relatively low. We can go further to understand why this is happening. First, type of injury is the most informative predictor. Second, some injuries commonly (but not always) correspond to a certain cause of death. The situation "usually but not always" is more characteristic of homicide. For example, an open wound of the thorax in 75 % of events of determined intent corresponds to homicides, and the group "other injury, poisoning and consequences of external causes" corresponds to homicide in 88 % of cases. If we interpret the category "usually but not always" as the share of some kind of events in the range of 66-95 %, then we found that 39 % events of determined intent belong to this category, though 17 % of them are "unusual" events. For non-transport accidents and suicide, the percentage is about 16 % and for homicide it is 33 %. Thus, when increasing the ECP the share of homicides decreases more steeply compared to the two other causes. Once ECP increases, events classified (mostly due to the type of injury) migrate from homicide to the set of unclassified events. Figure 2 illustrates the distribution of reclassified EUI cases by predicted cause of event for different ECP levels. The distribution of the result of population-level adjustment of EUIs by causes is similar to the distribution by causes of EUIs reclassified with ECP ≈ 0.85, providing further evidence of its validity. Table 5 shows how our model classified the events of undetermined intent. The upper part of the table shows  Table 5 has the same meaning as shown in Fig. 2 but shows results for both men and women. One can see that the distribution of the results of the population-level adjustment of EUIs for both men and women by cause is similar to the distribution by cause of EUIs when reclassified with ECP ≈ 0.85. The lower part of Table 5 helps to show that similarity between the actual distribution of known causes of reclassified EUIs increases as the ECP increases. As expected, the proportion of nontransport accidents among all deaths of undetermined intent is lower, and the proportion of homicides is higher, than among deaths of determined intent. The proportion of suicides is about the same. Table 6 presents the results of calculations based on the distribution of deaths of determined intent after the additional optional population level correction. Similar data by sex are presented in Appendix B. Using the standardized death rates on the right side of the table we can see that at ECP ≥ 0.75, only 6.4 % of deaths under consideration remain unclassified (6.7 % for men and 6.2 % for women). The population level adjustment included the majority of these cases being reclassified as non-transport accidents. A similar situation is observed for both men and women (as seen in Appendix B). For both sexes together, the SDR from suicide is slightly higher than from homicide, but for men this difference is greater (10 per 100,000) and for women the SDR from suicide is lower than from homicide. Table 7 shows annual SDRs for the years 2000-2011 for (1) deaths officially registered as non-transport accidents, suicides, and homicides, (2) deaths officially registered as being of undetermined intent but that our model classified as non-transport accidents, suicides, or homicides, (3) the adjusted rate (i.e., the sum of these two groups), and (4) the proportion of the adjusted rate accounted for by reclassification of events of undetermined intent. The table shows that the SDRs for deaths that we classified as non-transport accidents or as suicides (but that were officially registered as EUIs) were essentially stable between 2000 and 2011. Due to the decline in the death rate from officially registered suicides, however, the share of all suicides classified as events of undetermined intent grew significantly. The male SDR for

Sensitivity analyses
First, it is possible that our exclusion of transport accidents when reclassifying EUIs biases our results. In   theory, some transport accidents-especially deaths due to "Falling, lying or running in front of or into moving object. Undetermined intent." (Y31) and "Crashing of motor vehicle. Undetermined intent" (Y32)-may be recorded as EUIs. However, these deaths make up only 1.2 % of all EUIs and 1.3 % of all transport accidents. It also would be difficult to include transport accidents in our general reclassification model due to peculiar values on some important explanatory variables. For example, "Place of death" operates very differently in this context and is not comparable with that for deaths due to other external causes. Nevertheless, we executed an additional mlogit model to distinguish between deaths due to transport accidents from those due to the combined group of non-transport accidents, suicides, and homicides. The model identifies transport accidents with a probability of error < .01. Application of the model score to EUI deaths shows that only 1.4 % can be classified as transport accidents (respective percentages for Y31 and Y32 are 1.3 and 3.6 %), while these deaths show much higher probabilities of being homicides or suicides. Results are shown in Appendix D, and they suggest that in Russia transport accidents comprise a distinct group and that they have the potential to produce only a very minor impact on the distribution of EUIs and on the final distribution of external causes of death. Our decision to exclude transport accidents from our reclassification is also supported by information gained from the processing of such cases and by prior research. For example, in Russia nearly all fatal transport accidents are rapidly followed by investigations by the road police (a distinct branch of the Russian police force) and by criminal investigators with forensic expertise, which diminishes the chances for recording bias or classification as EUIs for such deaths. Further, although prior studies of the quality of cause of death diagnoses in Russia found that registration of deaths due to transport accidents has some limitations, these are less problematic than for other types of accidents and violent deaths [1,3,4]. Most obviously, the percentage of deaths from "other" and "unspecified" transport accidents comprise only 2.6 and 0.1 % of all deaths from transport accidents, respectively, which is much lower than corresponding categories for non-transport accidents, suicides, and homicides.
With respect to possible misclassification as EUIs, prior research focused on homicides and suicides but not transport accidents [2,[5][6][7][8][9]. Second, the default method of redistribution is to reattribute deaths within sex-and age-groups proportionately to the numbers of non-transport accidents, suicides, and homicide in it. An important related question is how much value our model provides over this default method. If our model-based results are very similar to the results from this default method of redistribution, then our model provides little added value (which would be an important finding in itself ). This default method of redistribution is a reasonable option in the absence of any other information. A similar method is to assume a priori that EUIs are hidden suicides [11,17] or hidden homicides [3] or both (but not hidden non-transport accidents) [18]. Prior studies of Russia, however, provide additional evidence suggesting non-proportional distributions. With natural causes, for example, there are strong reasons for adding ill-defined deaths from senility to the class of circulatory diseases [19,20]. For EUIs specifically, the evidence suggests possible misclassification of homicides and suicides [1][2][3][4][5][6][7][8][9]. In spite of this, we are unaware of any studies that used the reclassification method we are proposing. Still, it is important to compare the corrected distribution of external causes based on our model with the default method of redistribution. We did this and our results are shown in Appendix E. The results show that our model-based redistributions differ substantially from the results of the default solution.  Third, our analyses can be used for two distinct applications. One is to estimate the correct cause of death for any particular individual case. Another is to obtain the best estimate of population-level incidence of each type of injury. It is intuitive to employ the estimated probability as we do for the former, but not necessarily intuitive to use a threshold on the estimated classification probability for the latter. Our primary interest is to establish more precise population-level data on external cause mortality (i.e., the second application), which is why after the individuallevel reclassification of EUIs with mlogit we make the population-level adjustment on the EUI cases with the low mlogit probabilities. By employing the cutoff points in assigning cause of death our aim is to provide a more reliable basis for the population-level distribution. When we do so, we assume that the solutions with the mlogit probabilities below the cutoff suggest that insufficient information is provided by the explanatory variables. With the help of combinatorics, we know that the probability of getting (for example) a combination of 8 accidents, 1 homicide, and 1 suicide in ten trials is 0.151. It is also possible to interpret the hypothetical mlogit return of (0.8, 0.1, 0.1) as a vector of classification probabilities belonging to three fuzzy sets of deaths. This three-cause proportional sharing-based approach leads to a specific distribution by cause of death. We show the results of this proportional sharing-based redistribution in Appendix E, and again it is substantially different from our model-based distribution. We thank one of our reviewers for this suggestion.
Finally, we considered the possibility of preliminary conformal grouping because in theory it seems attractive to do separate redistributions for a few more homogeneous subgroups of EUIs within the corresponding specific categories of suicide, homicide, and non-transport accidents. Two reasons, however, make it very difficult to build reliable correspondences between EUI subgroups and the subgroups of non-transport accidents, suicides, and homicides. One reason is that prior studies of Russia [1][2][3][4][5][6][7][8][9] suggest imprecise registration of single item injuries and of violent causes, as well as high numbers of deaths due to "other" and "unspecified" events within subgroups of accidents and within subgroups of suicide and homicide. In particular, reclassification of falls of unknown intent (Y30) into unintentional falls (W00-W19), suicide by jumping (X80), and assault by pushing from high place (Y01) assumes these categories are reliable without false exchanges with other items. Yet we know that such exchanges are probable due to the low quality of single items and that it is better to use more reliable aggregate categories. Further, Y30 may be confused with Y31 and with Y33 and Y34, and items Y33-Y34 ("Other specified or unspecified events. Undetermined intent."), which can be included in any group, composed 31 % of all EUIs in Russia during the period under study (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011). The second reason is that there is a formal problem due to the presence of "other" and unspecified categories. One does not know, for example, what part of Y33 and Y34 should be assigned to Y30 and what part of X58-X59 should be assigned to W00-W19 before estimating the regression model.

Discussion
The rate of external causes of death due to events of undetermined intent is extremely high in Russia, about 28 per 100,000 residents between 2000 and 2011. Their proportion of all deaths from external causes accelerated in the years following the collapse of the Soviet Union, and the rate has not declined at the same pace as known external causes of death over the last decade (Fig. 1 above; [1,7,14]). However, Russia and other East European nations are not the only countries to experience limitations in classification of external causes of death. Between 2000 and 2010, for example, Table 1 shows that the proportion of all external deaths classified as events of undetermined intent was 15 % in Russia, 12 % in the United Kingdom, 10 % in Poland, 8 % in Sweden, 7 % in Germany, and 6 % in Denmark and the Czech Republic.
This limitation has important practical, scientific, and policy implications. For example, the rate at which the "event of undetermined intent" category is used may provide an indicator of the quality of vital statistics data, at least for external causes of death [1,7]. There are legitimate reasons-e.g., truly unknown intent, overworked and understaffed coroner's offices-to use this category. Unfortunately, there are reasons to believe that in some nations at some times this category may be employed to purposely misclassify homicide and suicide deaths [7,17,21]. Whether purposely or as an unintended consequence, another implication is that regular use of this category leads to under-enumeration of rates of important social indicators like homicide and suicide. As we show here, this under-enumeration can be substantial, and annual public reports of homicide and suicide rates rarely allude to EUIs as limitations of the reported rate. Another implication is that scholars interested in the structural covariates of homicide and suicide rates seem largely unaware of this category and do not account for it in their analyses, which may threaten the validity of these studies. The validity of individual-level studies of external causes of death may be similarly threatened, as are studies of interventions aimed at reducing deaths due to accident, suicide, or homicide.
The authors of some prior studies of mortality from violence and accidents in Russia suggested what may be hidden behind numerous death events with undetermined intent. Some scholars of mortality in East European nations believed external deaths due to undetermined intent may consist largely of hidden suicides [17]. Others believed the majority of these deaths were murders [21], or at least that a substantial portion of them are murders and that the misclassification in some instances may be purposeful [1,7].
A recent study by Ivanova et al. [22] made use of comparisons between deaths from known accidents, suicides, homicides, and events of undetermined intent by employing the distributions of the character of injury for deaths within the range of ages 20 to 59. Focusing on the most frequent combination of the type of injury and cause, their study offered a version of EUI redistribution, with a majority of EUIs being assigned either to homicides (34 %) or suicides (27 %).
Our study extends this recent work by bringing to bear a large set of informative micro-data. We were able to model the relationships between the three causes of death (non-transport accident, suicide, and homicide) and ten independent variables, which allowed us to predict the cause of death for EUI cases. The model tended unambiguously to assign most of EUIs to either homicide or to non-transport accidents, with a smaller role of suicide. With ECP ≥ 0.75, 33 % of EUIs were reclassified as homicides, 20 % as non-transport accidents, and 10 % as suicides, with 37 % remaining unclassified.
If one assumes that the probabilities of misclassification of causes of death for the EUIs are the same as the corresponding probabilities for deaths with known causes, the entire set of EUIs would be distributed with 48 % of cases assigned to non-transport accidents, 36 % assigned to homicides, and 16 % assigned to suicides. This result suggests that the proportion of hidden homicides among EUIs was 131 % higher than the corresponding proportion among the injury deaths of determined intent (36 % vs. 16 %). For suicides, these proportions are 16 % vs. 23 %, and for non-transport accidents they are 47 % vs. 62 %. Although we did not find strong support for the hypothesis that the EUI category is used mainly for hiding murder, the redistribution of EUIs does result in a substantial elevation of the official mortality figures for homicide. After the adjustment, the Russian age standardized homicide rate for 2011 is 20.0 per 100,000, which is nearly double the officially recorded value of 11.1 per 100,000. Similarly, the adjusted suicide rate of 24.9 exceeds the official rate of 20.0 by one-quarter.
There are further implications for homicide. According to our imputation, 33 % of all (i.e., officially recorded plus hidden) homicides were initially classified as EUIs (compared to 9 % of all non-transport accident and 5 % of all suicide deaths). Between 2000 and 2011, this proportion increased from 28 to 44 %. This supports the concerns of some scholars [1,7] about the quality of the Russian homicide data and the validity of the officially registered reduction in homicide mortality in Russia. According to Antonova's [23] estimates, the actual number of homicides at ages 20-39 years was about 1.5 times higher than that registered by official data, and at ages 40-59 the actual number of homicides was nearly twice as high as the official figure. Beyond the quality of vital statistics data and their use by scholars, this also may be considered an important signal for police (which record even fewer homicides than the vital statistics), criminal justice, and society as a whole. While there is no doubt many "hidden" homicides are legitimately classified as events of undetermined intent due to lack of biomedical and legal evidence, it is difficult to ignore the likelihood that a non-trivial proportion of them is hidden due to the weaknesses within the system for investigation or other reasons.
It is not uncommon for Russian pathologists to issue a provisional death certificate, which allows for burial but does not contain the precise cause of death. Although it is assumed a qualified certificate will be issued later to be used for vital statistics registration, in practice this does not always happen. In these cases, agencies must depend on the provisional death certificates. Gavrilova et al. [1] hypothesized that the increase in deaths attributed to unknown causes was due to a growing proportion of "Provisional" death certificates. Using data for 2011, we found that 32 % of deaths registered via a provisional death certificate were EUIs compared to 23 % of deaths registered via a final death certificate. Nevertheless, 80 % of all EUIs are based on final death certificates, so it does not appear that categorizing deaths as due to undetermined intent is a function of insufficient time to make an accurate diagnosis.

Conclusions
Overuse of the external cause of death classification "event of undetermined intent" may indicate questionable quality of mortality data on external causes of death. This can have wide-ranging implications for families, medical professionals, the justice system, researchers, and policymakers. We propose an indirect statistical method for reclassifying these deaths as nontransport accidents, suicides, or homicides, and at the population level we provide a means of further refining the method's outcomes. With the classification probability set as equal to or higher than 0.75, about two-thirds of EUI deaths can be reclassified. An additional assumption allows us to employ an optional population level computation to redistribute the remaining unclassified EUIs. To illustrate this method we employed Russian mortality data on nearly 3 million deaths due to external causes, a nation where the use of the EUI category is especially troublesome, and our method returned plausible and meaningful results. The method can be applied to data from other nations or sub-national populations in which the EUI category is employed and for which micro-data with additional information are available.      Appendix D

Estimation of the proportion of hidden transport accidents in the EUIs
We applied mlogit to estimate the proportion of hidden transport accidents in the EUIs. The training dataset includes all events with determined intent for the period 2000-2011 divided to two parts: (1) Transport accidents and (2) a combined category of non-transport accidents, suicides, and homicides. We started with the same list of variables as the one used in our main mlogit model for reclassifying EUIs. However, preliminary analysis showed that the variables for day of week, urban/rural residence, and specific location of death did not significantly contribute to differentiating between (1) and (2). The results of the mlogit model are presented below in Table 11. Using bootstrapping, we found that these outcomes are stable. Table 12 shows that the quality of our predictions is remarkably high, with 98.5 % of these cases correctly identified. Since we have to predict only two types of events, (1) vs. (2), all estimated classification probabilities (ECPs) are greater than 0.5. The average classification probabilities are very high, 0.989 for males and 0.992 for females. We were unable to establish a single "main predictor" for transport accidents. Any reduction of the variable list decreased the identification of transport accidents. Finally, we applied the result of this modeling to reclassify EUIs. Results are shown in Table 13. Our estimation procedures suggest that the percentage of hidden transport accidents in EUIs is under 1.5 %, with the mean estimated classification probability about 0.99. In sum, the results of these further analyses provide evidence supportive of our decision not to include transport accidents in our reclassification of externally caused EUIs and to reclassify these EUIs only into non-transport accidents, suicides, and homicides.