Please ensure Javascript is enabled for purposes of website accessibility Socioeconomic status determines COVID-19 incidence and related mortality in Santiago, Chile | #healthcare | #elderly | #seniors – Active Lifestyle Media

Health CareSocioeconomic status determines COVID-19 incidence and related mortality in Santiago, Chile | #healthcare | #elderly | #seniors

## Abstract

The current COVID-19 pandemic has impacted cities particularly hard. Here, we provide an in-depth characterization of disease incidence and mortality, and their dependence on demographic and socioeconomic strata in Santiago, a highly segregated city and the capital of Chile. Our analyses show a strong association between socioeconomic status and both COVID-19 outcomes and public health capacity. People living in municipalities with low socioeconomic status did not reduce their mobility during lockdowns as much as those in more affluent municipalities. Testing volumes may have been insufficient early in the pandemic in those places, and both test positivity rates and testing delays were much higher. We find a strong association between socioeconomic status and mortality, measured either by COVID-19 attributed deaths or excess deaths. Finally, we show that infection fatality rates in young people are higher in low-income municipalities. Together, these results highlight the critical consequences of socioeconomic inequalities on health outcomes.

The coronavirus disease 2019 (COVID-19) pandemic is an ongoing public health crisis. While many studies have described the transmission of SARS-CoV-2 –the virus that causes COVID-19– in North America, Europe, and parts of Asia (15), the characterization of the pandemic in South America has received less attention, despite the severe impact in many countries during the Southern Hemisphere winter season. While confirmed COVID-19 cases are an important public health measure to estimate the level of spread of infections caused by SARS-CoV-2, they may not be a reliable indicator of incidence because of biases due to population-level health-seeking behavior, surveillance capacities, and the presence of asymptomatic individuals across regions (6). Analyses of COVID-19-related deaths as well as excess mortality provide an alternative and potentially less biased picture of epidemic intensity (7, 8). This is in part because ascertainment biases may be less pronounced for deaths than for confirmed cases, as people dying from COVID-19 are more likely to have experienced severe symptoms and thus, more likely to have been documented as COVID-19 positive cases by health surveillance systems. Age specific death data may also help explain the heterogeneity in COVID-19 burden and COVID-19 attributable deaths in different countries (9). However, the role of other factors, such as socioeconomic status – which is correlated with health care access– on fatality and disease burden, remains a particularly important open question (10) for cities with significant economic disparities.

Here, we analyzed incidence and mortality attributed to SARS-CoV-2 infection and its association with demographic and socioeconomic status across the urban metropolitan area of the capital of Chile, known as ‘Greater Santiago’. Unlike many other countries, Chile set up a remarkably thorough reporting system and made many key data sets publicly available. To understand spatial variations in disease burden, we estimated excess deaths and infection fatality rates across this urban area. To understand disparities in the health care system, we analyzed testing capacity and delays across municipalities. We then demonstrate strong associations of these health indicators with demographic and socioeconomic factors. Together, our results show that socioeconomic disparities explain a large part of the variation in COVID-19 deaths and under-reporting, and that those inequalities disproportionately affected younger people.

## Association between socioeconomic status and disease dynamics

The Greater Santiago area is composed of 34 municipalities –defined as having more than 95% of its area urbanized– and is home to almost 7 million people. While this urban center accounts for 36% of the country’s population, it has reported 55% of the confirmed COVID-19 cases and 65% of the COVID-19 attributed deaths prior to epidemiological week 36 (end of August 2020). Socioeconomic status (SES) in the municipalities varies widely, with Vitacura having the highest value (SES = 93.7) and La Pintana the lowest one (SES = 17.0; Fig. 1A), and this difference is reflected in the impact of the pandemic during the Southern Hemisphere winter of 2020. The maximum incidence in Vitacura was 22.6 weekly cases per 10,000 individuals during the middle of May, while La Pintana reported a maximum of 76.4 weekly cases per 10,000 individuals during the first week of June (Fig. 1B). As shown in Fig. 1C and fig. S1, the attributed COVID-19 deaths follow a similar (yet lagged) temporal pattern to the number of reported COVID-19 cases. For instance, the highest rate of 4.4 weekly deaths per 10,000 individuals is observed in San Ramon, a municipality with a SES of 19.7, while Vitacura reported a maximum of 1.6 weekly deaths per 10,000 in June. These social inequalities impact the overall COVID-19 mortality rates as shown in Fig. 1D.

Changes in human mobility –a proxy for physical distancing– during lockdown periods follow a similar trend. Using human mobility indicators, inferred from anonymized mobile phone data obtained from the Facebook Data for Good Initiative, we show that the two municipalities with highest socioeconomic status exhibited a reduction in mobility by up to 61% during the full lockdown (dark green, Fig. 1E), compared to the ones with lowest SES, which, on average, reduced their mobility to 40% during the this period (dark pink, Fig. 1E). This relationship between reductions in mobility and SES was present during all time-periods considered for this study (Fig. 1F) and supports the hypothesis that people in poorer regions cannot afford to stay at home during lockdowns. Our result is consistent with analyses of New York City neighborhoods (11) and with findings from other studies conducted in Santiago that used different socioeconomic and mobility metrics (1214).

## Epidemic reconstruction reveals early transmission dynamics

In order to examine the possible bias present in the incidence data, we reconstructed SARS-CoV-2 infections over time by implementing a method called regularized mortality MAP (RmMAP). RmMAP back calculates the most likely infection numbers given the temporal sequence of deaths, the onset-to-death distribution, and the demography-adjusted infection fatality rate (IFR). Figure 2A shows the outcomes of this inference process, where the reconstructions from our approach and other methods are able to capture the main peak observed in May and June, with an estimate of the number of infected individuals that is 5 to 10 times larger than the reported values.

The reconstructions also reveal important differences in the inferred number of infections during March of 2020, the month in which the virus was introduced in Chile by travellers from affluent municipalities. We analyzed the number of tests performed between March 8th and April 9th, and find a significantly higher number of tests performed in municipalities with high SES (Fig. 2B), especially during the first two weeks of March (Fig. 2D). In addition, an early peak of reported cases was only observed in high SES municipalities during middle March (Fig. 2C), despite the fact that several COVID-19 deaths, which are lagged with respect to infection by up to several weeks, were reported in low SES municipalities during the same period. These findings suggest that an early first wave of infections occurred during March and quickly spread through the rest of the city without being captured by the official counts. Our RmMAP estimates at the municipality level support this claim, as they capture a high volume of early infections in most municipalities (Fig. 2E), a scenario that largely deviates from the official tallies (Fig. 2C).

To further validate the hypothesis of an early under-reporting in low SES municipalities, and to rule that these early activity estimates are not an artifact of our method, we performed experiments on a synthetic elementary model of two peaks of different sizes separated in time (supplementary materials). These experiments confirm that RmMAP is capable of recovering this bi-modal phenomena, while other methods fail to do so; they over-smooth the true signal and the earlier peak is typically not recovered. This early under-reporting signal suggests that the patterns of mortality and testing observed across the Greater Santiago are partially explained by an early failure of healthcare systems in informing the population with sufficient situational awareness about the real magnitude of the threat (15).

## Excess deaths match COVID-19 attributed deaths

Excess deaths –defined as the difference between observed and expected deaths– can provide a measure of the actual impact of the pandemic in mortality by quantifying direct and indirect deaths related to COVID-19 (7, 8, 16). We estimated the expected deaths for 2020 by fitting a Gaussian process model (17) to historical mortality data from the past twenty years, and used them to identify the increased mortality during to the pandemic, controlling for population growth and seasonality. As shown in Fig. 3A, the number of deaths observed between May and July 2020 is more than 1.73 [1.68, 1.79] times the expected value, with a peak surpassing 2110 death counts in epidemiological week 24 (first week of June, 2020) compared to an expected value of 802 deaths and an average number of deaths of 798 between 2015 and 2019.

When comparing the number of deaths by age in the year 2020 with our model’s predictions we observe striking patterns. Although people younger than 40 years old have an overall lower mortality rate than those from older age groups as expected, they still exhibit a nearly two-fold increase in the total deaths with a peak in the observed deaths occurring 2 weeks earlier than for those older than 60 years old (Fig. 3B). For the age groups 40-60, 60-80, and older than 80, the observed deaths are 2.8, 3.2, and 2.4 times higher than expected, respectively. Even though the age group 80+ exhibits the highest expected mortality values for 2020, the group that contains people between 60 and 80 years old displays the highest weekly count (936 during epidemiological week 24), the biggest deviation from the predicted values, and the highest values of excess deaths (645 more deaths than expected, Fig. 3B).

COVID-19 attributed deaths for the entire Greater Santiago area fall withing the credible intervals of excess deaths until late June, when the attributed deaths increase to rates that are even higher than the excess deaths, suggesting that under-reporting in COVID-19 attributed deaths is unlikely (Fig. 3C). COVID-19 confirmed deaths –those with a PCR-confirmed SARS-CoV-2 test– follow a similar temporal pattern, and the difference between confirmed and COVID-19 attributed deaths gets smaller toward the end of August, indicative of an improved testing capacity. This pattern is consistent when compared to normalized deaths by population size for each municipality (Fig. 3D), which also shows COVID-19 attributed deaths higher than the excess deaths in most of the cases. The anomalies in the observed versus predicted deaths for 2020 across different age groups also display a significant negative association with socioeconomic status, except for the 80+ group (Fig. 3E), suggesting a higher death burden in lower SES municipalities, independent of their age composition. Furthermore, the two municipalities with SES higher than 80 (Las Condes and Vitacura) had z-scores of much smaller magnitude (with the exception of the oldest age group) indicating that there patterns of mortality did not deviate much from what would have been expected on a normal year in people younger than 80 years old.

Although the observation that COVID-19 attributed deaths are greater than the estimated excess deaths might be counterintuitive (Fig. 3D), it may indicate the presence of changes in overall mortality patterns due to other causes, including a lower number of deaths due to reduction in the mobility. In addition, lower numbers of deaths were reported for respiratory infectious diseases such as influenza and pneumonia, and cancer during July and August of 2020 compared to the period 2015-2019 (Fig. 3F). Changes in mortality from respiratory diseases can be explained by a mild influenza season in the Southern Hemisphere during the winter of 2020 (18), which is consistent with our observation that much fewer cases of respiratory viruses have been detected in Chile during the 2020 season (supplementary materials). A decrease in the number of cancer attributed deaths can be explained by mortality displacement (19, 20), but additional analyses need to be conducted to establish this hypothesis. Alternative explanations for changes in all-cause mortality should also consider possible changes in external and behavioral causes of mortality. We do not observe a substantial contribution from these causes (see supplementary materials, along with additional detailed analyses).

## More testing with lower waiting times in wealthy areas

To further understand the consequences of insufficient early testing, we conducted a deeper analysis of different testing metrics at the municipality level. We first looked at testing capacity measured as weekly positivity rates, the fraction of tests that are positive for SARS-CoV-2. Our results show that the positivity signal tracked the course of the epidemic, peaking at times of highest incidence between May and July, and suggesting a highly saturated health-care system during this period across the entire city (Fig. 4A). A strong negative association between positivity and SES (Fig. 4B) further denotes difficulties in access to health care that is even more pronounced in lower SES municipalities. Despite changes in positivity rates over time, this testing metric also significantly correlated with number of cases (Fig. 4C) and number deaths (Fig. 4D).

Our findings on the number of tests conducted show a rather paradoxical association with SES and mortality. Many months into the epidemic, the early positive association between tests per capita and SES (Fig. 2B) reversed (Fig. 4E), indicative of an improvement in testing capacity over time, so that more tests were performed in the most affected areas. Similarly, the number of tests started to positively correlate with deaths (Fig. 4F), suggesting that the number of tests are strong predictors of mortality.

We also analyzed testing capacity by estimating the delays in obtaining test results. We inferred the distribution of the delay between onset of symptoms and report of the results, from which we obtained the proportion of cases that are publicly reported within one week since the onset of symptoms or timeliness (21). As shown in Fig. 4G, timeliness follows a similar temporal course as test positivity during May and part of June, but in the opposite direction. This metric is associated with SES, suggesting that municipalities with low SES, on average, get their test results later than the ones with high SES (Fig. 4H). Timeliness also negatively correlates with number of cases (Fig. 4I), total number of deaths (Fig. 4J), and with positivity (Fig. 4K). When looking at tests per death, a metric that can be used as a faithful proxy of testing capacity (22), we observe a positive correlation with socioeconomic status (Fig. 4L), indicating that testing disparities persisted during the epidemic, with low SES areas being affected the most. In the supplementary materials we further discuss the associations between our metrics and case counts.

## Infection fatality rate depends on socioeconomic status

In the absence of serological surveys, a direct inference of an infection fatality rate (IFR) is challenging. The degree of ascertainment depends on many factors, including testing capabilities and the likelihood of having symptomatic infections. Also, unlike deaths, age information of reported cases is not available at the municipality level, making this inference more challenging. To address these hurdles and to have estimates of the IFR, we implemented a hierarchical Bayesian model that considers the relationship between deaths, observed cases, and true infections across location, time, and age group. We first estimated the case fatality rate (CFR) by assigning total cases into age groups in a simple way that projects the overall age-distribution of cases to particular municipality demographics (Fig. 5A, see supplementary materials for details). With the exception of the oldest age group, case fatality rate shows a negative association with socioeconomic status. Similarly, our resulting IFR estimates once corrected for under-ascertainment display a similar pattern (Fig. 5B) but on an order of magnitude lower than the CFR estimates. We then grouped the municipalities into four categories of similar sizes and label them as low, mid-low, mid-high, and high socioeconomic category. When comparing the IFR ratio between the low and the high SES categories, the results show significantly higher infection fatality rate in the low SES group in people younger than 80 years old (Fig. 5C). The age groups 60-80 and 40-60 exhibit an IFR that are 1.4 and 1.7 times higher respectively in the low SES category, compared to the high SES one. The difference is even more pronounced in the younger age group (0-40 years old), which shows values of IFR that is 3.1 times higher for the municipalities with the lowest socioeconomic status. Altogether, these results are in line with the analyses of excess deaths presented in Fig. 3E. The lack of association between IFR and SES in the oldest age group can be attributed to a lower life expectancy (23), which in fact is factored in the estimation of SES (see methods for details) and that elderly people might be, in general, healthier enough to survive until that age.

## Discussion

In order to understand the true burden of COVID-19, it is critical to consider demographic and socioeconomic factors and their consequences for the public health response. Here, we analyzed data from the capital of Chile, a highly segregated city. Our results align with the recent literature on uneven health risks globally, which has highlighted how socially and economically deprived populations are more vulnerable to the burden of epidemics (24, 25). Mounting evidence suggests that such differences have also manifested in the context of the COVID-19 pandemic (26, 27). Since the pathways modulating these differential outcomes are not well-understood, comprehensive accounts are urgently needed (28), so that more resilient and socially-aware public-health strategies can be planned in advance of future pandemics. In Chile, recent studies have suggested a link between SES and effectiveness of non-pharmaceutical interventions such as stay-at-home orders (12, 13, 29). Our work further explores this topic by providing an holistic perspective about how the interplay between behavioral, social, economic, and public-health factors, modulates the observed heterogeneity in infection incidence and mortality. Along with the main findings, we also introduced several methodological innovations. Our Bayesian method for joint inference of infection fatality rates and under-reporting is a new contribution in this field. We show that it may not be necessary to have complete epidemiological data sets (here, age) to draw valid inferences, as long as the solution space is constrained enough by meaningful priors and demographic structure.

Our results show a strong link between socioeconomic and demographic factors with COVID-19 outcomes and testing capacity of COVID-19 in Santiago. This association is manifested as a reinforcing feedback loop, as highlighted by our findings. First, our analysis of human mobility indicates that municipalities with lower socioeconomic status were less compliant with stay-at-home orders, possibly because people from lower SES areas are unable to work from home, leaving them at a higher disease risk. Second, our analyses revealed an under-reporting of infections in low income areas at the start of the outbreak. Since public-health measures were taken in response to nominal case counts, these places were under prepared, with a poor health-care response that resulted in higher death counts. Third, anomalies in the overall excess deaths are higher in low SES areas, particularly in people younger than 80 years old, suggesting that more vulnerable municipalities were hit the hardest. Fourth, the analyses of test positivity rates, timeliness, and tests per death indicate an insufficient deployment of resources for epidemiological surveillance. Higher positivity rates in health care centers suggest the need for greater testing and detection. At the same time, slower turnaround in test results can lead to greater potential for transmission, since even small delays between onset of symptom, testing, and final isolation, significantly hinder the capability of public health systems to contain the epidemic (30). Finally, infection fatality rates were higher in lower SES municipalities, especially among younger people.

We propose two complementary explanations for the association between infection fatality rate and socioeconomic status. First, a higher IFR may reflect limited access to health services during the pandemic, and the strong association between the number of tests per death and SES supports this claim. We also show in supplementary materials that the South and West zones (based on health coverage division) have 4 times fewer beds per 10,000 people and 4 times lower proportion enrolled in the private health system than the East zone, which contains all the municipalities with an SES of 60 or higher. Strikingly, more than 90% of the COVID-19 attributed deaths in the South and West zone occurred in places other than healthcare facilities, compared to 55% in the East. Second, more vulnerable communities may experience a higher prevalence of the comorbidities (31) that are associated with more severe presentations of COVID-19. People in low SES municipalities are more likely to be overweight and to live in overcrowded conditions (supplementary materials), factors that ultimately can put these populations at higher disease risk. The interaction of these two explanations can lead to a high disparity among different socioeconomic groups.

Our findings need to be considered in light of the following limitations. Mobility data from mobile phones are likely to be biased due to differential mobile phone ownership in different demographic groups. While Facebook mobility data can be biased in this way, our results are consistent with other studies in Santiago that used different socioeconomic and movement measurements [see (1214) and supplementary materials]. Our methods depend on several assumptions. The back-calculated RmMAP estimates rely on a choice of the infection-to-death distribution and assume that the IFR do not change over time, and the excess mortality estimates depend on the choice of a kernel. Our IFR estimates are derived from a complex Bayesian model and are based on assumptions regarding reporting rates and age distribution of infections. Extensive sensitivity analyses suggest that our results are stable to deviations from these assumptions (supplementary materials).

To conclude, this study highlights major consequences of healthcare disparities in a highly segregated city, and provides new methodologies, that account for incomplete data, for studying infectious disease burden and mortality in other contexts.

## Materials and methods

### Data

#### Socioeconomic status

We define the Socioeconomic status index (SES) as SES = 100 − SPI, where SPI is the Social Priority Index (or ‘Indice de Prioridad Social’ in Spanish) estimated for 2019. The SPI index varies between 0 and 100, and has been reported yearly since 1995 by the Chilean Ministry of Social Development and Family. The SPI value denotes the priority of each municipality for the social programs of the regional government, and thus, municipalities with lower SES have higher social priority. The SPI index equally weights three components: (i) income and poverty, (ii) access and quality of education, and (iii) health factors such as access to healthcare and life expectancy. For each component, the values are standardized on a common scale from 0 to 100, where the value 100 represents the worst relative situation (highest priority) and 0 the best situation (least priority).

#### COVID-19

At the end of January 2020, the Chilean government determined that all suspected cases of COVID-19 must be notified in a mandatory and immediate manner to the respective Health Epidemiology Unit and the Ministry of Health, through the specific form on the EPIVIGILA platform. In addition to the suspected cases that are identified in healthcare facilities, the government also implemented an active testing surveillance program to identify asymptomatic and pre-symptomatic cases. The criteria for the active testing are: i) people who have not been identified yet as confirmed or suspected COVID-19 or (ii) living in vulnerable areas, and (iii) individuals who live in institutions for a long time such as jails, nursing homes, the National Service for Minors, among others. Symptoms onset dates are reported by the patient to a physician, in the case that the person attended a health institution, or by the volunteers that are conducting surveillance in the community through a survey.

The Chilean Ministry of Science, Technology, Knowledge, and Innovation has made possible the access to aggregated data collected through the EPIVIGILA platform, which are available in the format of multiple reports. These reports also contain data on population projections for 2020, testing, positivity, and other metrics used in the study. One of the reports tracks the number of cases whose onset of symptoms started at a given epidemiological week, for each municipality. Given that they are published twice a week (typically Monday and Friday), we were able to analyze the history of such reports to estimate the delays. Timeliness is thus defined as the probability of getting a retrospective delay smaller than 7 days, based on the Monday’s reports. More details can be found in the supplementary materials.

#### Mortality

The Vital Statistics System in Chile is continuous, mandatory, and centralized. It is composed of the Civil Registry and Identification Service (CRIS), the National Institute of Statistics, and the Ministry of Health through the Department of Health Statistics and Information (DHSI). When a person dies, a medical death certificate is generated by the CRIS and distributed to health institutions. The mortality database is built with the death certificates, which are subjected to a rigorous validation process, to guarantee the reliability and validity of the information. The DHSI standardizes the clinical terms in the format of the International Statistical Classification of Diseases (ICD-10). Since March 2020, the DHSI has implemented the recommendations of the WHO for coding the deaths resulting from COVID-19. In this study, confirmed COVID-19 deaths correspond to deaths in which the virus has been identified with a positive PCR test and have been coded as U07.1. Similarly, attributed COVID-19 deaths correspond to deaths in which the virus was not identified but clinically diagnosed as probable or suspected COVID-19 case, and have been coded as U07.2.

#### Human movement

Facebook’s Data for Good has provided access to their Geoinsights portal in response to COVID-19 crisis, from where it is possible to obtain aggregated data of their users (32). These data sets are anonymized and contain information of Facebook users that have a smartphone with the location services enabled. The movement vector from tile i to j (with i j) at time t is defined as the transition from the modal location i at the preceding 8-hour bin to the modal location j in the current 8-hour bin. Facebook also provides a baseline value, defined as the average number of users who transit from tile i to j at a given day of week and time of day during a baseline period. The baseline period corresponds to the 45 days prior to the initiation of the movement data for that particular location (for Chile, the data collection was initialized on 03/25/2020). Using this data set, we calculated the percentage change compared to baseline for each i to j transition at a given 8-hour period, and then estimated the average percentage change for each municipality and epidemiological week. We only used the starting location (municipality) for the average percentage change estimation. The size of the side of the tile is approx. 2.4 km.

### Models

#### Inference of SARS-CoV-2 infections with RmMAP

We aim to estimate the number of infected individuals over time

$Is$

given a series of observed COVID-19 attributed deaths

$Dt$

and a known onset-to-death distribution T. We use a Poisson deconvolution model for deaths given I and T:

$Dt|T,I∼Poisson∑sTt−sIs$

(1)where

$Ts=PT=s$

is the probability that the onset-to-death equals s days. Estimates of I maximizing Eq. 1 can be obtained with an expectation maximization algorithm (6, 3335), but the outcome is typically unstable (36). RmMAP overcomes this issue by adding a quadratic penalty to the log-likelihood. The iterations of RmMAP write as

$I^new=14λ1+8λIold−1$

(2)

$Isnew=I^snew1∑tTt−s∑tDtTt−s∑s′Tt−s′I^s′new$

(3)

By scaling the final series

$Inew$

by the inverse IFR we obtain the inferred values of infected individuals over time. A detailed discussion of this method along with sensitivity analysis and comparison with existing methodology are presented in the in the supplementary materials.

#### Estimation of excess deaths

We used Gaussian Processes (GP) regression (17) to estimates excess deaths for 2020. GPs can be understood as an infinite dimensional Bayesian regression: in the finite dimensional case one fits

$yi=∑iwixi+ϵi$

where

$ϵi$

are Gaussian independent identically distributed errors,

$xi$

are covariates and

$wi$

coefficients sampled from a prior

$pw$

. Likewise, with GPs we fit

$yi=fxi+ϵi$

where f is a function sampled from a prior over functions

$pf$

. GPs are appealing because the level of complexity is automatically adjusted by the complexity of data, and because they are computationally tractable.

Priors over f are specified through a kernel K, which encodes the correlational structure of data so that

$Kx,x′$

is simply the prior covariance between

$fx$

and

$fx′$

. K depends on a finite number of unknowns θ (so

$K=Kθ$

) that have to be inferred as well.

We used a GP to account for both long-term trends in mortality as well as seasonality. As in (17), we consider kernels of the form

$Kθ=Kθ1+Kθ2$

(4)where

$Kθ1$

is an exponential kernel representing the long-term variation, and is given by

$Kθ1x,x′=θ12exp−(x−x′)22θ22$

(5)and

$Kθ2$

is a periodic times exponential kernel representing seasonal variation

$Kθ2x,x′=θ32exp−(x−x′)22θ42−2sin2πx−x′θ52$

(6)

We considered an additional source of unstructured randomness through the term

$ϵi∼N0,σ2$

. We performed Bayesian inference (MCMC) over the joint distribution parameters

$θ,σ2$

and death counts for each time period of the 2020 year, based on 2000-2019 all-cause mortality data and suitable priors for the parameters. In the supplementary materials we comment on more specific aspects, and provide an extensive evaluation of our model.

#### Infection fatality rates

We deployed a hierarchical Bayesian joint model for reporting rates (and hence, IFR) per age group (a taking values 0 – 40, 40 – 60, 60 – 80 and 80 +) and municipality m, collapsing over the temporal dimension. We infer the number of infected individuals (and hence, IFR) based on reported cases C, positivity rates over time (t, month), and municipality, and total and COVID-19 attributed deaths D. The main appeal of this framework, is that although most of the components are not identifiable (e.g., if reporting rates and true cases are both unknown, the same observed case counts can be achieved by multiplying both by the same factor) (37), we can borrow from better known quantities (e.g., rough estimates of prevalence, reporting, etc) to enhance identification while propagating the appropriate levels of uncertainty over the parameters.

Specifically, the reporting rate

$rm,t$

links to the observed positivity rates

$posm,t$

(in log-scale) through a logistic-linear relation (with parameters β), and we have include random effects

$ϵm,t$

to represent unobserved causes of reporting

$logitrm,t=β0+β1×posm,t+ϵm,t$

(7)

Total infections by municipality and age

$Im,a$

are a fraction

$pm$

of the total population

$Pm,a$

, i.e.

$Im,a∼BinomialPm,a,pm$

(8)

An implicit assumption in Eq. 8 is the existence of an underlying municipality-specific proportion infected

$pm$

so that on each age group, the number of infected people is (on average)

$pm×Pm,a$

. We also assumed the following relation for

$pm$

$logitpm=p0+μm$

(9)where

$p0$

represents a baseline of the proportion infected and

$μm$

is a municipality-specific random effect.

We use parameters

$γm,t∈0,1$

to represent the temporal spread of infections;

$∑tγm,t=1$

so that

$Im,a,t=γm,tIm,a$

. Infections, cases, attributed deaths and age-stratified population sizes are linked through a cascade of binomial models. We relate infections, cases and reporting rates through

$Cm,t∼BinomialIm,t,rm,t$

(10)

Infection fatality rates

$IFRm,a$

relate to infections and deaths through another binomial model

$Dm,a∼BinomialIm,a,IFRm,a$

(11)where the IFRs follow a stratified logistic-linear relation with socioeconomic status (SES) and age mediated by parameters α, η, δ:

$logitIFRm,a=α0+α1+ηa×SESm+δa$

(12)

A comprehensive explanation of this hierarchical Bayesian methodology, including a discussion of its assumptions and several sensitivity analysis and robustness check to misspecification of our assumptions appear in the supplementary materials.

## References and Notes

1. F. S. Lu, A. T. Nguyen, N. B. Link, J. T. Davis, M. Chinazzi, X. Xiong, A. Vespignani, M. Lipsitch, M. Santillana, Estimating the cumulative incidence of COVID-19 in the United States using four complementary approaches. medRxiv 2020.04.18.20070821 [Preprint]. 7 August 2020; .doi:10.1101/2020.04.18.20070821

2. R. J. Acosta, R. A. Irizarry, Monitoring health systems by estimating excess mortality. medRxiv 2020.06.06.20120857 [Preprint]. 9 June 2020; .doi:10.1101/2020.06.06.20120857

3. J. M. Feldman, M. T. Bassett, The relationship between neighborhood poverty and COVID-19 mortality within racial/ethnic groups (Cook County, Illinois). medRxiv 2020.10.04.20206318 [Preprint]. 6 October 2020; .doi:10.1101/2020.10.04.20206318

4. A. Carranza, M. Goic, E. Lara, M. Olivares, G. Y. Weintraub, J. Covarrubia, C. Escobedo, N. Jara, L. J. Basso, The social divide of social distancing: Shelter-in-place behavior in Santiago during the COVID-19 pandemic. SSRN 3691373 [Preprint]. 12 September 2020; .doi:10.2139/ssrn.3691373

5. M. Lipsitch, M. Santillana, “Enhancing situational awareness to prevent infectious disease outbreaks from becoming catastrophic” in Global Catastrophic Biological Risks, T. V. Inglesby, A. A. Adalja, Eds. (Current Topics in Microbiology and Immunology, vol. 424, Springer, 2019), pp. 59–74.

6. C. K. Williams, C. E. Rasmussen, Gaussian Processes for Machine Learning (MIT Press, 2006), vol. 2.

7. Y. Li, E. A. Undurraga, J. R. Zubizarreta, Effectiveness of localized lockdowns in the COVID-19 pandemic. medRxiv 2020.08.25.20182071 [Preprint]. 30 March 2021; .doi:10.1101/2020.08.25.20182071

8. P. Maas et al., in Proceedings of the 16th International Conference on Information Systems for Crisis Response and Management, València, Spain, May 19-22, 2019 (ISCRAM Association, 2019), vol. 19, p. 3173.

9. P. P. B. Eggermont, V. N. LaRiccia, V. LaRiccia, Maximum Penalized Likelihood Estimation (Springer, 2001), vol. 1.

10. M. H. Chitwood, M. Russi, K. Gunasekera, J. Havumaki, V. E. Pitzer, J. A. Salomon, N. Swartwood, J. L. Warren, D. M. Weinberger, T. Cohen, N. A. Menzies, Reconstructing the course of the COVID-19 epidemic over 2020 for US states and counties: results of a Bayesian evidence synthesis model. medRxiv 2020.06.17.20133983 [Preprint]. 6 April 2021; .doi:10.1101/2020.06.17.20133983

11. G. Mena, P. Martinez, A. Mahmud, P. Marquet, C. Buckee, M. Santillana, Socioeconomic status determines COVID-19 incidence and related mortality in Santiago, Chile. Zenodo (2021); .doi:10.5281/zenodo.4699403

12. R. L. White, in Instrumentation in Astronomy VIII, D. L. Crawford, E. R. Craine, Eds. (International Society for Optics and Photonics, 1994), vol. 2198, pp. 1342–1348.

13. A. C. Miller, L. Hannah, J. Futoma, N. J. Foti, E. B. Fox, A. D’Amour, M. Sandler, R. A. Saurous, J. A. Lewnard, Statistical deconvolution for inference of infection time series. medRxiv 2020.10.16.20212753 [Preprint]. 20 October 2020; .doi:10.1101/2020.10.16.20212753

14. M. Titsias, in Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, D. van Dyk, M. Welling, Eds., vol. 5 of Proceedings of Machine Learning Research (PMLR, 2009), pp. 567–574.

15. R. E. Barlow, Ed., Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression (Wiley Series in Probability and Mathematical Statistics, no. 8, Wiley, 1972).

16. P. Gustafson, Bayesian Inference for Partially Identified Models: Exploring the Limits of Limited Data (CRC Press, 2015), vol. 140.

17. D. Fisman, S. J. Drews, A. Tuite, S. O’Brien, Age-specific SARS-CoV-2 infection fatality and case identification fraction in Ontario, Canada. medRxiv 2020.11.09.20223396 [Preprint]. 12 November 2020; .doi:10.1101/2020.11.09.20223396

18. M. Plummer et al., in Proceedings of the 3rd International Workshop on Distributed Statistical Computing (Vienna, Austria, 2003), vol. 124, pp. 1–10.

19. A. Wilson, R. Adams, in Proceedings of the 30th International Conference on Machine Learning (PMLR, 2013), pp. 1067–1075.