Introduction
Scientists have conducted hundreds of studies evaluating the health impacts of air pollution. This research is obviously important; careful studies that evaluate the impact of air pollution are crucial for the formulation of appropriate policy responses. One prevailing theory is that exposure to pollutants can damage physical development in infants and children, and that the resulting health deficits lead to increased mortality long after an individual is no longer being exposed to pollution.Footnote 1 For example, in her well-known book, When Smoke Ran Like Water, Davis (Reference Davis2002) argues that childhood exposure to steel production in towns like Donora had profound negative consequences for the later-life health of individuals.Footnote 2
While there are good reasons to be concerned about the long-term health consequences of childhood exposure to air pollution, most research focuses instead on the contemporaneous relationship between pollution exposure and health outcomes. The lack of research exploring the long-reach impact of pollution is likely due to the difficulties of assembling data that have measures of early-life pollution exposure and a sufficiently long period to examine postexposure health impacts leading to early mortality.Footnote 3 I am fortunate to have data that allow me to conduct such an analysis.
My research evaluates old-age mortality of individuals born in Pennsylvania towns in the early twentieth century, comparing outcomes for those born in steel-producing towns to those born in comparable towns with no steel production. The key to my research design lies in the history of the Pennsylvania steel industry. At the beginning of the twentieth century, a large share of US steel production was in Pennsylvania, some of which came from steel production facilities located in towns throughout the Commonwealth. There were also, of course, thousands of towns that had no steel mills. Steel production in relatively small cities was still commonplace in the early 1930s. By the start of World War II, there was a vast increase in US steel production due to the increase in international demand. This coincided with a substantial concentration of production in large integrated facilities in Pittsburgh and other major steel-producing locations, like Clairton and Donora.Footnote 4 Plant closures occurred in smaller and less efficient facilities, including most of the steel mills in small towns (DiFrancesco et al. Reference DiFrancesco, Kelly, Bleiwas and Fenton2010). Records from the mid-twentieth century confirm that most small-town steel plants that operated in the early 1930s were closed as a consequence of this industry concentration.
My focus is on a large sample of individuals who were born in towns in Pennsylvania from 1916 through 1927. These cohorts were aged 13 to 24 in 1940, at which time the steel industry had become largely centralized in the Pittsburgh region. The goal of the study is to see if individuals born in steel towns—who were exposed to pollutants from coal combustion and metallic exhaust as children and possibly into young adulthood—had higher levels of mortality in older age than individuals who were born in non–steel towns. My “comparison group” is comprised of individuals born in the same years in similarly sized neighboring non–steel towns, that is, to individuals born in non–steel towns in the same county. Because the research design looks at within county variation, the analysis focuses exclusively on towns, rather than larger cities. The reason is that there are relatively few counties in Pennsylvania that had more than one large city.
The analysis relies on a proprietary data set, the Duke Medicare/SSA Dataset, which includes place of birth, zip code of residence in older age or death (among those who are deceased), gender, and race. I merged these records to additional data: Pennsylvania historical industrial records are used to determine the town-level location of steel production facilities in the early twentieth century, and older-age residence zip code is matched to the median income in the zip code to form a crude proxy of lifetime prosperity. I thereby assembled data elements for more than 780,000 individuals, where approximately 390,000 were born in relatively small locations—“steel towns” and “non–steel towns.” In the following text, I show that these two groups are highly comparable along observable dimensions.
My key finding is that individuals who were born in steel towns have significantly higher mortality rates than those born in non–steel towns. The higher mortality is particularly pronounced for individuals who were born in steel towns that had relatively high levels of steel production capacity in the early 1930s. Also, the detrimental long-run impacts of pollution appear to be larger for people born in steel towns with low elevation and high levels of steel production capacity, perhaps because pollution exposure was more severe in valleys than in high-elevation towns. Key relationships pertain for both women and men.
The final piece of analysis centers on cause of death. For a substantial fraction of the deceased individuals in my sample, I am able to form unique matches to National Center for Health Statistics (NCHS) death certificate data. These data provide cause of death. It appears that the excess mortality among individuals born in steel towns is disproportionately due to elevated levels of cancer.
My work provides a distinctive contribution to the literature. To my knowledge, there is no other research evaluating the long-run impact of early childhood exposure to industrial air pollution.
The article proceeds in five additional sections: in “Literature” I provide a brief review of existing work on the health impacts of Total Suspended Particulates (TSPs), and fine Particulate Matter (PM), a particularly dangerous subset of “traditional” TSPs.Footnote 5 The primary point that emerges from this review is the absence of research that evaluates the consequence of early-life exposure to later-life mortality. In “The Steel Industry” I provide a historical overview of the steel industry, and discuss the potentially harmful chemicals that were released during production. “Early-Life Exposure to Pollution and Older-Age Mortality” provides statistical evidence on the association between early-life exposure to steel pollutants and later-life mortality, and “An Exploratory Analysis of the Cause of Death” provides evidence about the cause of death among individuals born in steel towns.
Literature
The approaches to studying health impacts of pollution are varied—reflecting differences in focus and methodology across disciplines.
Many studies in the broad area of public health and environmental sciences establish associations between exposure to pollutants and morbidity or mortality. A prominent example is the Harvard Six Cities study (Dockery et al. Reference Dockery, Pope, Xu, Spengler, Ware, Fay, Ferris and Speizer1993), which shows a strong and robust association between city-level air pollution and mortality (and in particular, excess mortality due to cancer and cardiopulmonary disease).
Researchers in medicine and epidemiology often seek to establish pathophysiological links between pollution and health outcomes. There is experimental work involving animals, and there is research exploring physiological impacts through natural means—examining the health of surrounding animal populations or vegetation.Footnote 6 Other research focuses on epidemiological evidence regarding pollution and specific hypothesized physiological deficits among humans. Among the fascinating work in this field is research demonstrating an association between exposure to fine PM and deficits in lung function development among children (Gauderman et al. Reference Gauderman, Avol, Gilliland, Vora, Thomas, Berhane, McConnell, Kuenzli, Lurmann, Rappaport, Margolis, Bates and Peters2004, Reference Gauderman, Vora, McConnell, Berhane, Gilliland, Thomas, Lurmann, Avol, Kunzli, Jerrett and Peters2007). Another study, Reynolds et al. (Reference Reynolds, Von Behren, Gunier, Goldberg, Hertz and Smith2003), links air pollution exposure to the development of leukemia and brain tumors in children.
Economists make a distinctive contribution by focusing on potential threats to the validity of causal inferences drawn from statistical associations between pollution and health outcomes. Economists often implement novel identification strategies that aim to plausibly establish causality. The primary point in this literature is that in most observational studies pollution is not randomly assigned to individuals, and researchers thus need to find quasirandom variation for estimating the impact of pollution on health (as discussed, e.g., in Dominici et al. Reference Dominici, Greenstone and Sunstein2014). Among the studies that exploit plausibly quasirandom variation is the work of Chay et al. (Reference Chay, Dobkin and Greenstone2003), who show that the 1981–1982 recession in the United States generated a reduction in location-specific TSPs, which in turn resulted in a substantial decline in the local infant mortality rate. Currie and Walker (Reference Currie and Walker2011) exploit the introduction of electronic toll collection to explore the consequences of reduced emissions due to traffic congestion. They find the reduced exposure to vehicle emissions resulted in reduced prematurity and low birth weight. Anderson (Reference Anderson2015) evaluates “quasi-random variation in pollution levels generated by wind patterns near major highways (p 1),” showing that air pollution increases mortality among the elderly.
Approaches to Studying Health Effects of Particulate Air Pollution
As noted in the preceding text, a vast literature is concerned with the health consequences of TSP emissions. Pope and Dockery (Reference Pope and Dockery2006) provide an extensive and enlightening review of the extant literature on the health consequences of pollution. In the Pope-Dockery taxonomy, most studies fall into three categories:
First, there are many studies that focus on short-term pollution exposure and mortality. These studies include analyses of severe pollution episodes such as the famous October 1948 “killing fog” in Donora, Pennsylvania (see Davis Reference Davis2002, and references therein). While these events are dramatic, it is difficult to evaluate the extent to which excess mortality is the consequence of harvesting, whereby most mortality is among individuals who would have died soon in any case. For my purposes—the study of early-life pollution exposure and later-life health—such events are unlikely to be useful unless they are truly dramatic.Footnote 7
Second, there are many prominent studies that focus on long-term particulate exposure. These studies include the famous Harvard Six Cities study (Dockery et al. Reference Dockery, Pope, Xu, Spengler, Ware, Fay, Ferris and Speizer1993) and the American Cancer Society prospective cohort studies (Pope et al. Reference Pope, Thun, Namboodiri, Dockery, Evans, Speizer and Heath1995). There have been many follow-up papers and reanalyses of the data collected for these studies. This research stream is particularly notable because the follow-up time can be impressive—more than 20 years in some cases (e.g., Thurston et al. Reference Thurston, Burnett, Turner, Shi, Krewski, Lall, Ito, Jerrett, Gapstur, Diver and Pope2016). Still, it is worth noting that the time frame with these projects is not nearly long enough to capture long-reach effects of childhood pollution exposure for older-age mortality.
Third, there are studies for which the “time scale” of exposure varies. In some instances, such variation might be reasonably thought of as quasirandom. An example is a 13-month shutdown of a steel mill in Utah Valley, which occurred because of a labor dispute (Pope et al. Reference Pope, Schwartz and Ransom1992). Another example is the Clancy et al. (Reference Clancy, Goodman, Sinclair and Dockery2002) study of the impact of the 1990 ban on the sale of coal in Dublin, which shows that the ban coincided with a decline in deaths due to respiratory and cardiovascular causes. Similarly, Rich et al. (Reference Rich, Kipen, Huang, Wang, Wang, Zhu, Ohman-Strickland, Hu, Philipp, Diehl, Lu, Tong, Gong, Thomas, Zhu and Zhang2012) exploit the fact that official policy limited pollution during the 2008 Beijing Olympics to study the relationship between air pollution levels and measures of human cardiovascular physiology. Again, though, none of these studies assess long-reach effects (e.g., impacts of childhood pollution exposure on health later in life).
The Steel Industry
My research examines long-reach consequences of being born in a steel town in early-twentieth-century Pennsylvania. To set the stage for the empirical analysis, it helps to briefly consider the history of iron and steel production in Pennsylvania, and to make note the technologies used in the production of steel and the properties of the ensuing pollution.
Steel Production in Early-Twentieth-Century Pennsylvania
Making steel, at the most basic level, involves inputs of energy and iron ore. By the early twentieth century, industrial processes for the mass production of steel were well established. The energy source was coke, a fuel with high carbon content made from coal in “coke ovens.” Iron was extracted from iron ore (in a process known as smelting), transformed into pig iron, and then refined into steel. Some facilities existed separately for the purpose of melting and reshaping, such as rolling mills, which produced plates or quality flat and long products (de Beer et al. Reference de Beer, Worrell and Blok1998). By the beginning of the twentieth century, there were two competing processes being used for steel production. One was the Bessemer process, developed independently by the British metallurgist Sir Henry Bessemer and by William Kelly, a Pittsburgh-based ironmaster.Footnote 8 In this process cold air is blown through a refractory-lined vessel (the “Bessemer converter”) that contains molten iron, converting impurities into oxides that can then be separated as slag. No additional fuel was required other than approximately one ton of coal per one ton of steel produced. The Bessemer process was used widely in early-twentieth-century plants, including those in Pennsylvania. The second method was the open-hearth process, developed by Williams Siemens and first used to produce steel in France in 1864 (ibid.).
This process, which provided for better quality control, eventually replaced the Bessemer process and was in widespread use in Pennsylvania in the early twentieth century.Footnote 9
Warren (Reference Warren1973) notes that at the end of the 1800s, the steel industry in Pennsylvania consisted of a large number of relatively small firms that would buy raw materials on the open market. Pig iron was purchased by production facilities that were only capable of refining from the secondary product. Iron ore and coke, much of which came from Connellsville, were used by pig iron furnaces located in many Pennsylvania towns (Warren Reference Warren2001). In 1903, for efficiency reasons, steel companies began to build by-product coke ovens in integrated facilities that produced pig iron and steel, beginning in Pennsylvania in the Pittsburgh area. Warren (Reference Warren1973) suggests that this change in coking was particularly beneficial to producers near Pittsburgh, which could transport coking coal down the Monongahela River to integrated production facilities. However, the establishment of linkages between sites in the Commonwealth was slow and substantial headway toward integration did not occur until 1918, when US Steel put up the world’s largest by-product coke oven at the Clairton Works, an integrated steelmaking operation 20miles south of Pittsburgh.
Integrated plants, primarily along rivers in the Pittsburgh area, had a clear productivity advantage in the steel market. Many production facilities, including those in small towns, survived into the early 1930s, but the market tended to weed out poorly located or inefficient mills and furnaces (Warren Reference Warren1973). By the 1950s, there was only a minimal level of steel production at noncentral, nonintegrated mills located in the small cities and towns of Pennsylvania (American Iron and Steel Institute 1954).
Exposure to Pollution from Steel Production
The dangers from exposure to steel and iron production were not well understood in the early twentieth century, and the steel industry was subject to virtually no environmental regulation. According to a 1963 report by the Division of Air Pollution of the Public Health Service (Schueneman et al. Reference Schueneman, High and Bye1963), there was little regulation of particulate matter emissions due to steel production until the Los Angeles County Air Pollution Control District set out standards in 1948 and 1949, and the Allegheny County (PA) Health Department enacted limitations on particulate matter discharges from steel production in 1960.
The 1963 report provides extensive documentation of the enormous levels of air pollutants being produced at various stages of steel production using technologies common in the first half of the twentieth century. Coke production was noted to produce “smoke, dust, hydrogen sulfide, and phenols. Other contaminants generated by destructive distillation of the coal include pyridine, cresol, carbon monoxide, ammonia, methane, ethane, and ethylene, in addition to a host of other organic compounds found in coal tar” (35). Both Bessemer converters and open-hearth furnaces were noted to produce large amounts of particulate matter, resulting in high local concentrations. For instance, the report notes that from a typical open-hearth furnace with no pollution control (which was common even in 1960) “a general average of 4500 pounds of dust and fume per day is emitted during production of 500 tons of steel” (45).Footnote 10 Even so, while the report documents high levels of air pollution due to steel production, the authors conclude, “We do not know the significance of increases in atmospheric pollution levels indicated by some of the air sampling studies, because we have yet to establish permissible community levels for most pollutants. Acute episodes like that in Donora are always possible given the proper set of circumstances, but this does not mean that the steel industry is any more responsible than many others, nor does it mean that the long-term health of the populace is necessarily suffering” (86).Footnote 11
Given that as late as 1963 experts did not view air pollution from steel production to be a serious health risk, it is unlikely that parents in the 1920s and 1930s would have known that they were putting themselves or their children at risk by living in a steel town. Put another way, there is no reason to believe that parents in steel towns were disproportionately unconcerned or unaware about health impacts of pollution.
More recent literature paints a bleaker picture about the health risks due to air pollution from steel production. One comprehensive report (Environmental Protection Agency 1995) documents that steel production entails the release of large quantities of carbon monoxide (CO), nitrogen dioxide (NO2), particulate matter of 10 microns or less (PM10), sulfur dioxide (SO2), and volatile organic compounds (VOC), and may also include polycyclic aromatic hydrocarbons (PAHs), chromium, lead, and manganese.
A review article by Curtis et al. (Reference Curtis, Rea, Smith-Willis, Fenyves and Pan2006) provides reference to many studies that examine the impacts of air pollution of the sort produced by steel production: (1) many studies link exposure to high levels of CO, NO2, PM10, and SO2 to respiratory health issues, including worsened asthma and rhinitis in children and higher rates of asthma and bronchitis among the elderly; (2) several studies have found links between exposure to CO, NO2, PM10, and VOC and adverse heart disease events, for example, myocardial infarction and congestive heart failure; (3) PM10 and NO2 has been associated with lung cancer, and PAHs have been linked to various forms of cancer;Footnote 12 and (4) air pollutants have been associated with preterm births, low birth weights, premature deaths, and higher rates of atrial septal defects.
A recent report on the toxicological properties of coal emissions shows that coal-burning power plants produce hazardous air pollutants that cause irritation and tissue damage to the eyes, skin, and breathing passages at high levels of exposure (Billing Reference Billings2011). Billing suggests that exposure to these pollutants can cause latent diseases that can develop over many years and may be a contributing factor to such fatal conditions as heart disease and brain impairment. In a 1989 report on steel mills prepared by the Radian Corporation for the EPA, it is noted that substances of concern to public health not only include coke oven emissions from coal burning but also heavy metal emissions (e.g., copper, cadmium, and chromium). Chromium, for example, has been shown to cause damage to nasal passages and, in long run studies, has been linked to lung cancer (Radian Corporation 1989).
An important study that provides direct evidence about the detrimental effects of air pollution due to steel production is the Utah Valley “natural experiment” (Pope Reference Pope1989). The study shows that during the closure of the steel mill, children’s hospital admissions were substantially lower than when the mill was operating. This was particularly true for admissions due to bronchitis and asthma. Similar findings pertained for adults, though the relationship was not as strong. As noted in the preceding text, it appears that there are no comparable studies that investigate long-reach health impacts.
Early-Life Exposure to Pollution and Older-Age Mortality
My contribution is to examine the long-reach hypothesis. Specifically, the question in my empirical investigation is: “Do individuals who likely had higher levels of exposure to steel production emissions early in life have relatively higher levels of old-agemortality?” My data allow for investigation of this question for mortality post–age 65.
Data
The key to providing evidence about this research question is unique data, the Duke SSA/Medicare data, which match complete Medicare Part B records with Social Security records using the Numerical Identification Files (NUMIDENT) of the Social Security Administration. Black et al. (Reference Black, Sanders, Taylor and Taylor2015) indicate that, for cohorts born in 1916 and after, these data cover approximately 85 percent of the population.Footnote 13 Location of birth is supplied at the county or town level; Black et al. find that approximately 80 percent of records are at the town level. I use only data that include location of birth at the town level. The data also include location by zip code at age 65 or date of death (for those who are deceased), and they include gender and race. Records extend through 2002, so it is possible to analyze rates of mortality for people aged 65–75 for cohorts born 1916 through 1927. The variable for “survival” is living to 75 given the individual has lived to 65.
I merge the Duke SSA/Medicare data to historical records that provide locations of pollution sources. Data on steel production in Pennsylvania comes from two primary sources. The first source is from the Secretary of Internal Affairs of the Commonwealth of Pennsylvania (1903).Footnote 14 The second source is from an independent historical source published by the American Iron and Steel Institute (1930). This latter source is especially valuable for my purposes. It gives the location of all steel production plants in Pennsylvania as of 1930, a year in which individuals in my study sample were aged 3–14. By comparing the locations of small-city steel production in 1903 and 1930, I find that nearly all steel production that occurred in 1930 was in towns that also produced steel in 1903. Thus, individuals who were born in steel towns from 1916–27 (and who remained in those towns through childhood) would have been exposed to pollution from birth through at least 1930. However, as mentioned previously, many smaller steel production facilities shut down as the steel industry consolidated during the 1930s.Footnote 15
In constructing a convincing analysis, it is worth noting a potential complication. Growing up in a steel town could possibly affect old-age health due to the long-reach consequences of pollution exposure, but it could have a negative impact on health for other reasons as well. For example, steel towns might differ from nonsteel towns in terms of early-life disease exposure due to differences in geography (e.g., being on a river and population density). As a second example, the closure of steel production facilities surely led to job loss, and, as Sullivan and von Wachter (Reference Sullivan and von Wachter2009) show, job displacement generally leads to increased mortality among men. In turn, this may have adversely affected the long-term health outcomes for steel workers’ children.
Given these issues, one approach for trying to identify the impact of steel production pollution exposure on long-term health is to look at “dosage effects.” First, I consider the level of steel production in the town. An important advantage to using 1930 historical records is that they include not only the location of steel production facilities but also measures of the steel production capacity. This measure is the sum of steel production capacity, in 1,000s of tons, for each plant in operation in the town. Thus, I can see if individuals who are born in steel towns with relatively high levels of production, and thus relatively high levels of pollution, also have disproportionately lower survival in old age.
Second, I consider the elevation of towns. Many steel towns in Pennsylvania were in valley towns. At a minimum it is important to control for elevation as a way of being sure that any possible negative long-term health consequences to being born in a low-elevation town is not falsely attributed to pollution from steel production. More importantly, pollution from steel production was likely to be particularly harmful in low-elevation valley towns, in which air inversions may have caused pollution to become trapped for extended periods—potentially exposing individuals to more concentrated levels of pollution.
Finally, I also match the older-age residence zip code to the median income in the zip code to form a crude proxy of prosperity in later life. This variable serves as a control variable in many of my regression analyses that follow.
Table 1 provides a set of summary statistics for my sample: white individuals, with complete data records, for the birth cohorts 1916–27 born in Pennsylvania. Among those who are excluded from the sample are individuals for whom it was not possible to match birthplace to a “populated place” as given by the US Geological Survey. Typically, this happened if a county and city had the same name. The most important example is Philadelphia. Thus my starting sample excludes those individuals. In total the sample is quite large—more than 780,000.
Note: Author’s calculations using the Duke SSA/Medicare Data Set. “Cities” are birthplaces with more than 2,534 individuals in the data (which is the median) and “Towns” are birthplaces with 2,534 or fewer individuals in the data (the median and below). The income proxy variable is the median annual income (in 2000 dollars) in the zip code where the individual resides post age 65. Statistics are presented for a sample that includes whites only, for those for whom data are complete. The sample excludes Philadelphia (see text). Standard deviations are in parentheses.
As noted in the introduction, I split the sample according to the size of birthplaces; my analysis focuses on people born in towns. Table 1 thus summarizes key variables according to the size of individuals’ birthplaces. First are “cities,” corresponding to city sizes above the median (a birthplace reported to have more than 2,534 individuals in the SSA/Medicare data set). “Towns” are birthplaces with 2,534 or fewer individuals (i.e., the median and below).
The first row on table 1 shows that in cities, well more than half of people were born in places with a steel mill. Indeed, steel production was especially concentrated in the largest cities; in 1930 there were steel mills in all but 2 of the 13 largest cities represented in this group (Wilkes-Barre and Hazelton). The second row shows that survival to age 75, conditional on being alive at 65, is approximately 80 percent and does not vary much across the size of one’s birthplace. Finally, the lifetime “Income Proxy”—median zip-code level income at the residence in old age—is somewhat higher for those born in cities than in towns.
In some of my empirical work I conduct within county analyses of the association between mortality and birth in a steel town. The goal is to compare mortality for individuals who were born in steel towns to similarly sized neighboring non–steel towns. As discussed in more detail in the text that follows, very few counties have more than one large city, so within-county analysis is viable only for towns. Thus, in table 2, I focus on sample characteristics for the sample of more than 390,000 individuals born in these locations.
Note: Author’s calculations using the Duke SSA/Medicare Data Set matched to historical records on the location of steel production. “Towns” are as defined in table 1. Steel production capacity is measured in 1,000s of tons per years (as of 1930).
The first row of table 2 shows that probability of surviving to age 75, conditional on being alive at age 65, is slightly lower for those born in steel towns than in non–steel towns. As for other characteristics, a slightly higher fraction of those born in steel towns than non–steel towns reside in Pennsylvania in old age. Interestingly, people who migrate out of Pennsylvania tend to be more prosperous (according to the Income Proxy) than those who do not.Footnote 16 Both the means and standard deviations of the Income Proxy are very similar for those born in steel towns and non–steel towns. Finally, steel towns tend to have lower elevation than non–steel towns.
Research Design
The fundamental goal is to see if early-life exposure to high-polluting steel production is associated with older-age mortality for individuals born in steel towns from 1916 to 1927. This entails conducting a multivariate analysis in which I compare individuals born in steel towns to individuals born in non–steel towns.
Inference in this setting faces a several issues. Consider the following regression model of survival:
where S i is an outcome “survival” variable equal to 1 if individual i survives to age 75 and 0 if she or he dies (conditional on survival to age 65); I i is 1 if the individual was born in a town with steel production in 1930 and 0 otherwise, and is meant to capture effects of early-life exposure to pollutants from steel production; Z itis an indicator variable equal to 1 if the individual belongs to the specified birth cohort × gender cell (e.g., men born in 1927); and X i is a vector of all other relevant factors that affect old-age survival.
Unfortunately, while the data include reasonably good measures for S i, I i, and Z it, there are virtually no data for X i. In the following regressions, these omitted variables are subsumed in the error term, which of course is a problem if they are correlated with I i.
To deal with this issue, in much of the following analyses I proceed by conducting within-county comparisons using similarly sized locations. The idea is that many of the early-life factors that might affect later-life health (i.e., variables in the vector X i) are likely to be comparable in similarly sized locations within the same county. To implement this idea, I include county fixed effects in many regressions reported in the following text. There are 67 counties in Pennsylvania, of which 21 contain at least one steel town.
This strategy means that it is not especially credible to proceed with the “cities” in my analysis. The problem is that among the 67 counties in Pennsylvania, there are very few counties with multiple “cities” (e.g., only seven counties had more than three cities) and in many counties that do have multiple cities, all the cities had steel production facilities in 1930.Footnote 17 For this reason, I restrict all the following analysis to “towns,” that is, those with a population at or below the median in my data.
Beyond the use of county fixed effects, I have one additional control—the median income by zip code at age 65 variable—as described in the preceding text. The inclusion of this variable is intended to help with the following issue: suppose that steel production plants bring prosperity to a community, so that steel towns tend to have higher incomes. A large literature establishes a positive relationship between higher income levels and survival.Footnote 18 In the following text, I find that inclusion of an income proxy variable does not alter the key results in the analyses; this provides some evidence that any correlation between town-level steel production and prosperity is not driving results.
With this research design in mind, again consider table 2. Among the more than 390,000 individuals born in small towns, more than 21,000 were born in steel towns. These individuals likely had on average much higher exposure to air pollutants when they were young than did those born in towns without steel production.
A key concern for the research design is that the data do not include the age at which individuals migrated out of their hometown. Ideally, for the purposes of the study, migration rates would be very low at young ages—so that most of those born in steel towns do indeed get exposure to pollution at least throughout childhood and youth. While age of migration is not known (so duration of pollution exposure cannot be directly measured), it is possible to provide some indirect evidence on the issue. Specifically, I calculated estimates of residence, by age, for individuals born in Pennsylvania, 1916–27, using 1920–2000 public-use census samples. As it turns out, migration in early childhood was quite uncommon; fewer than 8 percent of Pennsylvanians migrated to another state by age 15 and only approximately 16 percent migrated by age 22. To the extent that these individuals did not get full early-life exposure to pollution, inclusion of these individuals in the sample leads to an underestimation of the old-age mortality effects of pollution. As for those who remained in their hometowns, pollution exposure would in all cases be through at least 1930, as noted previously, and in a relatively small number of cases might have extended beyond 1954.Footnote 19 In the regression analyses that follow, samples include both those who remain in Pennsylvania and those who moved outside of Pennsylvania by old age.Footnote 20
An additional source of bias may be selective mortality. As noted in Vaupel et al. (Reference Vaupel, Manton and Stallard1979), Vaupel and Yashin (Reference Vaupel and Yashin1985), and Vaupel (Reference Vaupel1997), heterogeneity in latent human frailty can leave relatively more robust individuals in the older population. Of course, the ideal data set would permit the evaluation of mortality at ages other than 65 to 75, especially at younger ages. To the extent that there is selective mortality for my sample, the population born in steel towns would be positively selected post age 65, in which case my estimates represent an underestimate of the impact of early-life exposure to pollution on older-age mortality.
Results
The article’s first results are given in table 3. Column (1) shows that in a regression with a steel-town indicator variable and gender by cohort effects, there is a negative coefficient on “Birth in a Steel City.” People born in steel towns have lower old-age survival rates than those born in non–steel towns, and this association is statistically significant. Age by cohort effects (not reported in the table) are as follows: the omitted category was men in birth cohort 1916. Then for men in other cohorts, estimated effects are generally quite small, with a bit of an upward drift for later cohorts. For women the estimated cohort effects are large, typically on the order of 0.09.Footnote 21
Note: Author’s calculations using the Duke SSA/Medicare Data Set matched to historical records on the location of steel production. Dependent Variable is “Survival to Age 75” conditional on survival at age 65. The sample includes only those who are born in Pennsylvania towns (see table 1 for definition). There are 67 counties in Pennsylvania, of which 21 contain at least one steel town. “FE” indicates fixed effect. Standard errors, clustered at the town level for (1) and (2), are in parentheses. *p < 0.10; **p < 0.05; ***p < 0.01.
Column (2) shows that inclusion of the income proxy does not much alter the key inference in column (1). As for the income proxy variable, the coefficient on this variable is positive and statistically significant—people who live in higher-income communities also live longer. Columns (3) and (4) show that the key inference is very similar for a specification that includes county fixed effects. This is a primary result in my analysis: within counties, people who were born in small steel-producing towns have lower survival rates than people born in towns where there was no steel production.Footnote 22
To put this result into perspective, recall that the overall mortality rate for ages 65 to 75 is approximately 20 percent (see table 1). Individuals born in steel towns have mortality that is approximately 0.75 percentage points higher than comparable individuals born in non–steel towns. Thus mortality is approximately 4 percent higher for those born in steel towns.
Columns (4) and (5) provide analyses separately for women and men. For both women and men, the coefficient on “Born in Steel Town” is negative, though the coefficient is statistically significant only for women.
As mentioned in the preceding text, it is possible that even within counties, steel cities and non–steel towns differ along unobserved dimensions that affect old-age mortality beyond any effects of air pollution. To give an example, suppose all steel towns were along major rivers while many non–steel towns were not, and suppose further that river towns were more susceptible to water-borne diseases. Then it might be incorrect to attribute lower survival rates among those born in steel towns to pollution generated by the steel mills.
To address that problem, I took the sample of steel towns and divided them into towns that in 1930 had steel production capacity that was lower than the median and higher than the median (using data from American Iron and Steel Institute 1930). Results, presented in table 4, suggest that low survival rates for individuals born in steel towns is not due to being born in a steel town per se but instead is the consequence of relatively high levels of steel production.Footnote 23 The results are consistent with the idea that reduced old-age survival associated with exposure to steel production is due primarily to high levels of exposure. Mortality from ages 65 to 75 is approximately one percentage point higher for those born in steel towns with above-median steel production capacity than for those born in non–steel towns within the same county.Footnote 24 My finding that high-pollution places specificallyhad more excess deaths bolsters the plausible role of pollutants as a key cause of later-life mortality.
Note: Author’s calculations using the Duke SSA/Medicare Data Set matched to historical records on the location of steel production. Dependent Variable is “Survival to Age 75” conditional on survival at age 65. The sample includes only those who are born in Pennsylvania towns (see table 1 for definition). Standard errors, clustered at the town level for (1), are in parentheses. *p < 0.10; **p < 0.05; ***p < 0.01.
Columns (2) and (3) of table 4 show similar patterns for women and men. This is interesting because, it may be presumed, some men born in steel towns may have worked as steel workers in adulthood, in which case being born in a steel city might affect older-age health using any lasting effects of working in that industry. The same is much less likely for women of this era. This evidence is consistent with an interpretation that attributes a role for early-life exposure to pollution for generating excess mortality among people born in steel towns.
The final piece of evidence concerns a potential role for elevation in shaping my findings. Among individuals born in steel towns, it seems likely that exposure to PM would have been worse for those born in low-elevation steel towns, that is, towns that were in valleys, which would have typically accumulated higher concentrations of PM due to atmospheric inversions. A simple way of evaluating that idea is to see if there is a correlation between survival probabilities and the elevation of the town (which was determined using Geographical Analysis Tools 2012).
In table 5, I report the results of an analysis that incorporates elevation. First I take the basic regressions from table 4 but add “Elevation” (measured in 1,000s of meters). Results are reported in column (1). Elevation does not have a statistically significant impact on survival in the regression. The specification reported in column (2) also includes two interaction terms: first, an interaction between “Elevation” and “Born in Town with Below-Median Steel Production Capacity” and, second, an interaction between “Elevation” and “Born in a Town with Above-Median Steel Production Capacity.” Finally, column (3) reports this same specification but with county fixed effects. In this last regression, the main effects of being born in either type of steel town are negative, and the coefficient on “Born in Town with Above-Median Capacity” is sizable (−0.025) and is highly statistically significant. Importantly, the main effect of “Elevation” is close to 0. This suggests that any relationship between birth in a steel town and old-age survival is not due to an omission of elevation in my regressions. Coefficients on the interaction terms are positive, but neither is statistically significant in the specification with county fixed effects.
Note: Author’s calculations using the Duke SSA/Medicare Data Set matched to historical records on the location of steel production. Dependent Variable is “Survival to Age 75” conditional on survival at age 65. Elevation is in 1,000s of meters. The sample includes only those who are born in Pennsylvania towns (see table 1 for definition). Standard errors, clustered at the town level for (1), are in parentheses. *p < 0.10; **p < 0.05; ***p < 0.01.
Figure 1 illustrates the estimated relationship for those born in small steel cities with above-median production using the coefficients estimated in the specification with county fixed effects, column (3) in table 5.
An Exploratory Analysis of the Cause of Death
As discussed in the preceding text, a well-established literature links air pollution exposure to contemporaneous morbidity and mortality outcomes among adults. For instance, the important work of Pope et al. (Reference Pope, R. T., M. J., E. E., D., K. and G. D.2002) and Pope and Dockery (Reference Pope and Dockery2006) points to all-cause mortality risks due to air pollution, and also to specific mortality risks due to cardiovascular disease, respiratory disease, cardiopulmonary disease, and lung cancer. In addition, as Yang et al. (Reference Yang, Lai, Hsieh, Hsueh and Chi2002) and Kang et al. (Reference Kang, Rothman, Cho, Lim, Kwon, Kim, Schwartz and Strickland1995) document, steel and iron production generates PAHs, a carcinogenic class of compounds (Okona-Mensah et al. Reference Okona-Mensah, Battershill, Boobis and Fielder2005).Footnote 25 Recent work suggests links between air pollution exposure and the development of cervical and brain cancer (Raaschou-Nielsen et al. Reference Raaschou-Nielsen, Andersen, Hvidberg, Jensen, Ketzel, Sorensen, Loft, Overvad and Tjonneland2011) and breast, prostate, bladder, cervical, and ovarian cancer (Al-Ahmadi and Al-Zahrani Reference Al-Ahmadi and Al-Zahrani2013). In general, it appears that there are multiple pathways whereby air pollution affects a variety of organs and systems—including the respiratory system, the cardiovascular system, and the nervous system—and various forms of air pollution are linked to carcinogenicity in both human and animal populations (Kampa and Castanas Reference Kampa and Castanas2008).
There is evidence also of detrimental impacts of pollution exposure for infants and children. Negative effects of air pollution include prematurity and low birth weight (Currie and Walker Reference Currie and Walker2011), increased risk of developing asthma, and decrements in lung function development (Gauderman et al. Reference Gauderman, Avol, Gilliland, Vora, Thomas, Berhane, McConnell, Kuenzli, Lurmann, Rappaport, Margolis, Bates and Peters2004, Reference Gauderman, Vora, McConnell, Berhane, Gilliland, Thomas, Lurmann, Avol, Kunzli, Jerrett and Peters2007). Hazardous air pollutants have also been associated with increased leukemia in children (Reynolds et al. Reference Reynolds, Von Behren, Gunier, Goldberg, Hertz and Smith2003).
Much less is known about pathophysiological mechanisms whereby childhood exposure to pollution might affect disease processes that increase old-age mortality decades later. There are many possible links. One study of canines, for example, provides evidence that high levels of air pollution exposure damage brain neuropathology, and the authors argue, “Neurodegenerative disorders such as Alzheimer’s may begin early in life with air pollutants playing a crucial role” (Calderon-Garciduenas et al. Reference Calderon-Garciduenas, Azzarelli, Acuna, Garcia, Gam-bling, Osnaya, Monroy, Rosario DEL Tizapantzi, Carson, Villarreal-Calderon and Rewcastle2002). To give a second example, Friedman et al. (Reference Friedman, Whitton, Leisenring, Mertens, Hammond, Stovall, Donaldson, Meadows, Robison and Neglia2010) show that adults who survived childhood cancers such as leukemia are substantially more likely to develop subsequent neoplasms, and that the risk of subsequent neoplasms increases as individuals age. This is potentially relevant to understanding the long-reach consequences of childhood exposure to air pollution given evidence that air pollution may be a risk factor in the development of childhood leukemia (Reynolds et al. Reference Reynolds, Von Behren, Gunier, Goldberg, Hertz and Smith2003).
With all this in mind, in this section A report results of an exploratory analysis that asks if the cause of death listed on individuals’ death certificates differs for individuals born in steel towns and non–steel towns. The starting place is to look for an association between birth in a steel town and mortality risks due to the two broad leading causes of death—cardiovascular disease and cancer.
Data
The Duke SSA/Medicare data do not include cause of death (or indeed any other information about medical conditions). To determine cause of death for individuals in the sample, I proceed as follows: the NCHS provides microlevel data from death certificates, including underlying cause of death. They also include the following list of characteristics that allow for potential matches to the Duke SSA/Medicare data: a rough measure of location of death (typically county), year and month of death, day of the week of death, gender, exact age of death, and state of birth.Footnote 26 By utilizing available crosswalks and constructing match algorithms as needed, I was able to find a large number of one-to-one unique matches of individuals across the two data sets. For example, the Duke SSA/Medicare data provide date of death, which can easily be converted to month and day of week for purposes of matching to the NCHS data. Similarly, the Duke SSA/Medicare data have the zip code of death, which can be converted to match the NCHS death location variable (which is a Federal Information Processing Standards code).
For the present analysis, I apply my match algorithm for the NCHS death certificate microdata and Duke SSA/Medicare data for white individuals born in Pennsylvania, 1916–27. The Duke SSA/Medicare data are available only through 2002, so I can match to NCHS death data only through age 75 for the most recent cohort (and only through age 85 even for the earliest cohorts). Also, there are some individuals for whom records are incomplete (e.g., no cause of death listed) or for whom I cannot form a unique match. Here I proceed only with those cases for which I can form one-to-one matches. I retain only individuals born in “towns” (as in the empirical work mentioned previously). After matching, my sample decreases from 391,851 to 357,906.
My analysis focuses on the cause of death for individuals who die between the ages of 65 and 75. I assume that, conditional on location of death, age, and gender, data matches are missing at random. As a consequence, in the following text I use inverse probability weighting—forming weights using match rates by location of death, age of death, and gender—to up-weight observations that correspond to missing cases.
Analysis of Cause of Death
I begin by forming categories comprised of individuals who died between age 65 and 75—classified by death due to (1) cancer, (2) heart disease, and (3) other causes. Heart disease and cancer are the two leading causes of death among the older population, and, as noted in the preceding text, much of the literature on the health impacts of pollution focuses on these diseases. Using codes from the ninth and tenth revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-9 and ICD-10), a death is recorded as being “due to cancer” if the code is Neoplasm; “due to heart disease” if the code is Diseases of the Circulatory System; and “due to other causes” for all other codes. See table 6. In that table I also separate out deaths “due to lung cancer” (Malignant Neoplasm of Respiratory and Intrathoracic Organs) because that particular form of cancer has been analyzed in many previous studies of air pollution.
Note: Author’s calculations using the Duke SSA/Medicare Data Set matched to death certificate data from the NCHS. Death codes are from the ninth and tenth revision of the ICD-9 and ICD-10. Statistics are based on unique matches.
I proceed by estimating multinomial logit regressions that parallel the linear regressions in “Early-Life Exposure to Pollution and Older-Age Mortality.” Here, though, the outcome variable is membership in one of four categories. The “omitted category” is survival to age 75. The remaining three categories are death due to the three broad causes listed in the previous paragraph. Explanatory variables are as in the regressions reported in the preceding text, that is, the “Income Proxy,” cohort by gender effects, and an indicator variable for being born in a steel town.
The first two columns of table 7 summarize the findings. Rather than reporting estimated coefficients from the multinomial logit regressions, which are difficult to interpret, I focus on estimated marginal effects of birth in a steel town. Column (1) of table 7 gives the direct marginal effects of being born in a steel town. These effects will on average be positive because being born in a steel town is associated with an increase in all-cause death (as shown in the preceding text). Estimated effects are positive for each cause: birth in a steel town is associated with an increase the probability of death due to cancer of 1.5 percentage points. Other associations are smaller (0.6 percentage points for heart disease and 0.5 percentage points for other causes). The result for cancer is statistically significant, while the other estimates are not.
Note: Author’s calculations using the Duke SSA/Medicare Data Set matched to NCHS death certificate records and historical records on the location of steel production in Pennsylvania. Estimates are from a multinomial logistic regression. Additional covariates are cohort × gender effects and income proxy. Robust standard errors, in parentheses, are calculated using the delta method. *p < 0.10; **p < 0.05; ***p < 0.01.
Estimates in Column (2) of table 7 are perhaps more interesting because they show the proportional impact of birth in a steel town on cause of death, which allows us to see if the cause-of-death composition is different for individuals born in a steel town than for comparable individuals born in non–steel town. Estimates indicate that those born in steel towns have a substantial 18 percent higher probability of dying of cancer.
To explore the results relevant to cancer in more detail, in Columns (3) and (4) I repeat the analysis but separate lung cancers from other forms of cancer. Importantly, the disproportionate impact of birth in a steel town on death due to cancer is not due to lung cancer. Indeed, the proportional impact of lung cancer is found to be similar to noncancer death causes (and is not statistically significant). This is interesting for two reasons:
First, it suggests that the empirical findings presented in this article are not due to a possible omitted behavioral factor, smoking. If people who grew up in a steel town smoked at higher rates than those from other small towns, this alone could account for higher mortality rates at older ages. But such an impact would be operating in part through an increase in lung cancer.Footnote 27 It is reasonable to conclude, therefore, that differential smoking rates likely do not play an important role in shaping empirical findings.
Second, an important health consequence of long-term adult exposure to air pollution is an increased risk of death due to lung cancer (Pope et al. Reference Pope, R. T., M. J., E. E., D., K. and G. D.2002). There is no evidence in my data of the same disease process for childhood exposure. Apparently, the increased risk is due to other forms of cancer.
Finally, table 8 gives estimates from a multinomial logit regression in which the nine broad categories of neoplasm (from the ICD-9 and ICD-10) are included as separate causes of death. Estimates suggest that birth in a steel town is associated with increased death due to neoplasm from seven of the nine categories, and for the two that do not have positive associations, estimates are extremely close to zero. Excess mortality due to cancer among those born in a steel town is concentrated in two causes: malignant cancer of the brain and nervous system (along with other unspecified sites) and cancer of the genitourinary organs (e.g., prostate, uterine, and ovarian cancer). Strikingly, rates for each of these two causes increases by more than 40 percent for individuals born in steel towns dying between 65 and 75.
Note: See note to table 7. Robust standard errors in parentheses. *p < 0.10; **p < 0.05; ***p < 0.01.
There is some work linking excess death due to brain cancer and air pollution. For example, a study by Liu et al. (Reference Liu, Chen, Wu and Yang2008) establishes a link between petrochemical air pollution and brain cancer among individuals 29 and younger, and Savitz and Feingold (Reference Savitz and Feingold1989) show an association between childhood brain cancer and residential traffic density. In one important study, it was found that children whose parents have occupational exposure to PAH are more likely to develop childhood brain tumors (Cordier et al. Reference Cordier, Monfort, Filippini, Preston-Martin, Lubin, Mueller, Holly, Peris-Bonet, McCredie, Choi, Little and Arslan2004).
As for cancer of the genital organs, there is some previous evidence of an association between air pollution and these forms of cancer. For example, Winkelstein and Kantor (Reference Winkelstein and Kantor1969) find an association between prostate cancer and TSPs in the Buffalo, New York area, which replicates a previous finding for a population in Nashville, Tennessee. Similarly, a study by Iwai et al. (Reference Iwai, Mizuno, Miyasaka and Mori2005) found a correlation between TSPs and ovarian cancer in Japan, and Hung et al. (Reference Hung, Chan, Wu, Chiu and Yang2012) find a relationship between exposure to air pollution and ovarian cancer in Taiwan. There is no previous research linking childhood exposure to TSPs and the subsequent onset of these cancers.
I believe that my work is the first to document associations between air pollution exposure in childhood and death due to particular forms of cancer in later life. This work is exploratory; much more empirical research is needed before drawing definitive conclusions.
Conclusion
The primary innovation in this article is to establish empirical links between the location in which individuals are of born, their mortality at older ages, and the disease processes that cause deaths in older age. In a sample of more than 390,000 individuals born in small cities in Pennsylvania, 1916–27, individuals who were born in steel towns are found to have significantly higher rates of mortality post–age 65 than those born in comparable towns that did not have steel production facilities. There are four notable features of this association: First, the relationship holds for people born in neighboring towns, that is, within the same county. Second, the relationship between old-age mortality and birth in a steel town is stronger in towns that had relatively higher levels of steel production. Third, old-age mortality is especially high for individuals born in places with relatively high levels of steel production and relatively low elevation, which is consistent with the possibility that low-elevation locations were subject to atmospheric inversions that tended to trap air pollution, thereby increasing pollution exposure. Fourth, the elevated old-age mortality among those born in steel towns is due primarily to increased carcinogenicity.
Overall, the evidence is consistent with an important idea in the literature—that exposure to air pollution at young ages can cause latent disease processes that affect morbidity and mortality later in life. Reasonable caution should be exercised when it comes to interpretation, as it is impossible to rule out a role for unobserved factors that might affect results. Nonetheless, the research presented in this article represents a valuable first step in a scientific agenda that has an important goal—to understand how exposure to environmental threats in childhood affect well-being over a lifetime.
Acknowledgments
This research was supported in part by an NIA training grant to the Population Studies Center at the University of Michigan, and by a postdoctoral NIA training grant at the Duke Population Research Institute. I am grateful for the support and for use of the services and facilities at these two institutions. I thank James Vaupel and Seth Sanders of Duke University for providing access to the Duke SSA/Medicare data set, and thank Seth Sanders for helpful discussions. I would also like to thank Martha Bailey, Dan Black, Yan Chen, Janna Johnson, Ryan Kellogg, Yusufcan Masatlioglu, Stephen Salant, Mel Stephens, Evan Taylor, and Lowell Taylor for additional feedback. Any errors or misunderstandings in the article are my own.