Energy Insecurity and COVID 19 Replication Paper
Econometrics II
Professor Leah Brooks
Carl Mackensen
Replication Paper
4/22/2022
Part I: Introduction
Energy insecurity is a significant issue for those dealing with poverty. It is markedly harder to deal with when family members become sick, and miss working time, or have to pay medical bills. This article, Sociodemographic disparities in energy insecurity among low-income households before and during the COVID-19 pandemic, by Memmott, Carley, Graff, and Konisky, attempts to answer the question, “Did COVID-19 make families at risk of energy insecurity worse off for this measure?” They do this through using a fixed effects model that has a number of demographic and COVID covariates, and examine the relationship between these measures, and three measures of energy insecurity, namely did the family not pay a bill, was a disconnection notice issued, and was a disconnection completed. They found that marginalized groups were at the most risk, and that COVID exacerbated energy insecurity for these groups. For my extension, I attempt to address some omitted variable bias by including both state level fixed effects, and month of unemployment fixed effects. I was able to for the most part replicate what the authors did, and my novel extension is an interesting addendum to add to the work of the authors
Part II: Article Summary
Energy insecurity is, in essence, the inability to meet basic energy needs for survival. Often, this mostly affects those in poverty, or members of marginalized groups such as racial minorities or disabled people. COVID-19 directly contributed to the exacerbation of energy insecurity for those at risk of it. In this study, the authors administered a survey for those at or below 200 percent of the federal poverty line between April and May 2020. It looks at data captured for the previous year so that analytics can examine energy insecurity and a number of correlates to see how these households have fared, between both pre COVID and during COVID times. The specific question they sought to answer was, how has COVID impacted energy insecurity for impoverished families, making the causal claim through their work that COVID exacerbated energy insecurity for marginalized groups through the use of a fixed effects model. Their specification for all of their regressions, both logistic and otherwise, was:
Energy Insecurity = a + B (demographic correlates) + B (COVID correlates) + e
The primary means of measuring energy insecurity in this survey are not being able to pay an energy bill, getting a notice of disconnection, and receiving a disconnection. After examining the relationship between a number of demographic factors on energy insecurity, the authors used logistic regressions to tease out the effect of correlates on energy insecurity. This was examined over both the previous year (before COVID onset), and last month (during early days of COVID). It was found that the poor, ethnic minorities, the disabled, and other marginalized groups were more at risk of exacerbation of energy insecurity as a result of COVID-19.
In an effort to describe the effect in greater detail, the authors also included data in their surveys on COVID-19 related issues. These included “whether they had received a COVID-19 stimulus payment…,whether their employment status had changed due to the pandemic and whether someone in their household had symptoms of or a positive test for COVID-19.” (page 189). They also constructed a measure of hardship due to COVID-19. It was found via reestimated logistic regressions that despite including the COVID measures, the same correlates were responsible for energy insecurity both before and in the early days of the pandemic, but that the COVID-19 correlates were also correlated in a positive way to energy insecurity. The table in Appendix II, Table One details both their findings, and my replication, which I will discuss below. Appendix II, Table Two focuses on the COVID measures. What is interesting is that the COVID measure of receiving the stimulus, this was found to be negatively correlated with the three energy insecurity measures. This makes sense, as having more money means you can put it to use for basic needs like energy provision. The remaining three COVID measures of hardship, lost job hours, and symptoms all had positive coefficients, to greater and lesser extent, which again makes sense as they negatively impact one’s ability to work and subsequently pay bills.
Part III: How the Results Match
After reading this piece, I found a few specific aspects of the analysis to be the most relevant to the author’s final conclusions, and most amenable to furthering their exploration of the topic. The first was Figure 4 from the original article, in which Energy Insecurity was examined across all three measures of energy insecurity, including whether a family was unable to pay a bill, whether a disconnection notice was issued, and whether there was a disconnection, for a number of COVID-19 related correlates. As such, I found the means and standard deviations for each correlate, each of which had three energy insecurity measures. I did this in STATA, and put the information gathered there into Excel to make bar graphs comparable for each correlate. While the original paper was able to nicely include the three measures of energy insecurity for each of the eight correlates on a single graph, I had to make separate graphs for each correlate, each graph containing the three energy insecurity measures. The results can be found in Appendix Section I, with Figure 4 being repeated after the fourth of the correlates graphs again for ease of comparison. I was able to come up with comparable percentages of energy insecurity measures for each correlate, however my Standard Deviations were way off from what the original authors had as their 95% confidence intervals for each measure, and as a result I did not include them. I was able to replicate the majority of the findings of the paper, however I was unable to replicate the confidence intervals. This is likely because the authors used a sample size that was larger than the one I used. It is unclear why the authors had more respondents in their summary table of means, as I used the direct data that they used. However, confidence intervals aside, the percentages for each figure generally match in size. Their figure, included below, is in percentage point terms, while mine is in percentages between zero and one.
I went on to recreate one of the logistic regressions ran by the authors, specifically the one in the Appendix, Table 10, or the logistic regression predicting energy insecurity in the last month, ie during the onset of COVID-19, or when the survey was administered. This is an important baseline measure to include, as will become apparent when I discuss my own analysis. The original results compared to mine are in Appendix Section II, Table One. This is the results of the logistic regression they conducted, of the demographic and COVID covariates on the three measures of insecurity for the preceding month, and the corresponding coefficients I found. I was, for the most part, able to get pretty close in terms of both sign and size of the coefficients. This is somewhat unsurprising, as I used their direct data. However, things are still somewhat off. The full results for table 10 and my replication of it are in Appendix II, Table One, but it is most informative to compare the coefficients on the COVID covariates specifically, as they are measures of how COVID impacted energy insecurity. Therefore, I have made a side by side comparison table of these results, juxtaposing their findings with my own. This is in Appendix Two, Table Two. Firstly, for the could not pay energy bill last month, the first measure, COVID stimulus, is -0.315 in theirs, and I found it to be 0.441. This is somewhat odd, as both the sign and magnitude are different. The rest of the measures for the COVID variables are similar in size and sign. The same holds true for the second measure of energy insecurity, namely received a shutoff notice. Again, theirs was -0.388 and mine was 0.477 for the COVID stimulus measure. Lastly, for disconnection, there is the same issue, with theirs being -1.225 and mine being 0.162 for the stimulus measure, while again the last four COVID measures were fairly similar. This is interesting, because it points to some heterogeneity between what I found, and what the authors found. This could be because they included additional fixed effects not specified in the paper, or it could be a result of my using a limited data set. Again, as described above, it makes logical sense that the coefficients for the stimulus would be negative in relationship to energy insecurity measures, as, again, it means having more money to pay bills. My findings, which were different from the authors’ as detailed above, show that perhaps the relationship is more nuanced, and perhaps that those who received the stimulus were still at risk of energy insecurity as they were not able to pay their bills.
I then went on to recreate one of the robustness checks the authors employed in their final analysis, Table 12 from the appendix of tables, which runs standard regressions to predict energy insecurity measures for the previous month, including covariates and COVID conditions. I ran the three standard regressions for each of the measures of energy insecurity. Again, I was able to predominantly recreate what the authors found, the comparative results are in Appendix III, Table One. Here, I was able to get much closer to the values that they had. Perhaps this is because they did not add additional covariates or fixed effects for this model, but rather did what they said they did, and simply ran the regression for all the coefficients listed. Again, the full results of the regression are in Appendix III, Table One, but I pulled out the four COVID measures to do a side by side comparison once more, in Appendix III, Table Two below. And again, for the first measure of COVID stimulus for the not paying a bill criteria, theirs was -0.033 while mine was 0.045, while the remaining COVID coefficients were similar. For the second measure, receiving a shutoff notice, for COVID stimulus theirs was -0.031 and mine was 0.035, while the rest of the measures were comparable. Lastly, for disconnection, again COVID stimulus was -0.033 for them, and 0.030 for mine, with the remaining four being comparable. All of this points to their being some sort of systematic difference between the data or specification they used, and what I used. It still makes sense that receiving a stimulus means less energy insecurity, as they found, but my own findings point to a more complex portrait of things, where other issues may be at play.
Part IV: Issues of Causality
The authors used fixed effects for both their logistic regression and OLS regression. It is certainly conceivable, however, that there are additional omitted variables which would bias the results. Specifically, the measures of interest are whether the COVID variables resulted in increased energy insecurity, as the authors claim they found, holding all demographic variables constant. It was found that COVID exacerbated the issue of energy insecurity, and that marginalized groups were particularly affected.
What omitted variables could bias the authors’ estimates? It seems straightforward to imagine a number of them. Firstly, there is the question of which state in the USA you are living in. Different states had vastly different COVID responses, and also had underlaying issues regarding how they approach poverty generally, and energy insecurity specifically. It is conceivable that different states with different COVID responses would therefore contribute to variables that are correlated with both COVID measures, and measures of energy insecurity. This could have taken the form of an abolishment of rent for certain populations during COVID, of how states managed furloughed and unemployed populations, of how energy providers either did or did not enforce their disconnections or threats of disconnection, and how much savings people in a given state had. I am sure there are a host of other issues too that I cannot myself imagine, but regardless it is the case that there may be very many variables correlated with both energy insecurity and COVID measures. Secondly, there is the temporal issue of when a given household became affected by COVID. For this data, we have a month of unemployment variable. It is conceivable that those who lost their jobs earlier compared to those who lost their jobs more recently would have a more difficult time paying for energy. As such, this would be correlated with both COVID measures, and energy insecurity measures.
Part V: The Extension
In order to address the causality issues and omitted variable bias threats outlined above, I did a novel analysis using the data provided by the authors. I chose to redo the logistic regression that I replicated for Table 10, which is the heart of the paper’s ultimate findings, and include both state level fixed effects, and fixed effects for the month of losing employment. This results in a broader total picture of the relationship between energy insecurity and COVID-19 impactedness. I used all the same original covariates as controls as well. I include my findings for the three measures of energy insecurity in Appendix IV, Table One, compared directly with those of the authors. Comparing this, we have a fuller picture of the overall effect.
My regression took the form:
Energy Insecurity = a + B (demographic correlates) + B (COVID correlates) + B (state fixed effects) + B (month of unemployment fixed effects)+ e
What is particularly interesting to me is that, after the inclusion of the state and month of unemployment fixed effects, we find results for the demographic and COVID covariates that are closer to those of the original paper’s findings, or Table 10. This leads me to believe that the authors did indeed include other covariates in their analysis using the logistic regression, at least to the extent that they included state level fixed effects. Again, the full results of the regression are in Appendix IV, Table One but as above I pulled out the COVID related covariates for direct comparison, between what they found for their original logistic regression, and what I found with the full specification including the state and month of unemployment fixed effects. This is in Appendix IV, Table Two. For the measure did not pay an energy bill, it was found that for COVID stimulus, theirs was -0.315 and mine was 0.567, with COVID Hardship and lost job hours both being comparable, but COVID symptoms being different, with theirs being 0.449 and mine being -0.033. For received a shutoff notice, again COVID stimulus was different, theirs being -0.388 and mine 0.763, while COVID hardship was larger for me than theirs, with theirs being 0.813 and mine being 2.102. The remaining measures were comparable. Lastly, for the disconnection measure, theirs was -1.225 and mine was 3.814 for the stimulus measure, which is a marked difference, while the remaining variables were off, but not so markedly as to be described here. Again, their findings make sense that stimulus is negatively related to energy insecurity, but my more robust specification points towards the potential of a broader story to be told about this relationship.
These differences could have come to pass by virtue of the fact that I included additional covariates in my logistic regression, including not just the demographic and COVID variables, but also the state and month of unemployment fixed effects. Further, the differences mentioned are significant, implying that there is a difference in the underlaying data used to construct the results.
It is also interesting is examining both the outcomes of the state level fixed effects, reproduced in Appendix IV, Table Three, and the month of unemployment fixed effects, reproduced in Appendix IV, Table Four. This has the potential of, if not solving the endogeneity issue laid out above, then at least making the specification more accurate. For the state fixed effects, a large number of states were omitted due to not having respondents from those areas, but the results are still interesting, and informative. For the second measure of energy insecurity, shutoff notices, Arizona had a -0.21, Georgia had a -3.11, and Maryland had a -0.32. This means that, for these states, it was markedly better for the potentially energy insecure than in other states, in terms of this measure of energy insecurity. For the disconnection measure, it was found that Arizona had a -1.94, California a -0.19, Hawaii a -0.17, Illinois a -0.86, Delaware a -1.30, Ohio a -3.60, Pennsylvania a -3.27, and Texas a -0.96. Again, this means that, for these states, it was markedly better, all else held constant, to be poor during COVID than in other states.
In terms of the month of unemployment fixed effects, the results were equally interesting. There were no clear temporal trends, but rather instead losing your job in different months is associated with different degrees of energy insecurity measures. There were a number of months that actually had negative values compared to the rest of the months, implying that it was actually better, in terms of being energy insecure in the previous month, to lose your job during these months than otherwise. This could be the result of a time lag between losing a job and being energy insecure in the previous month. Perhaps savings and wealth that were drawn upon impacted households’ energy expenditures as well. This proves difficult to quantify.
What these results show is that the general landscape for logistic regression analysis of COVID and demographic values compared to the three measures of energy insecurity is more complex than the authors initially let on. Perhaps they did not think to include these fixed effects, or found the results too strange to include in their analysis, but regardless, there is the issue of causality and omitted variable bias to contend with. There are a number of possible omitted variables present when considering the causal claims of the authors. I have attempted to examine these issues, and reported the findings. What is further interesting is that, even in the presence of this novel analysis, the results that the authors first reported are actually strengthened. I do not know why the authors would therefore omit such an analysis from their reported findings, as their position that COVID exacerbated the three energy insecurity measures is strengthened, but perhaps they simply did not want to get too deep into explaining the somewhat odd findings outlined in this section.
Part VI: Conclusion
In brief, I was, for the most part, able to replicate the work of the authors. Firstly, I replicated the summary statistics figure for the means of different demographics in terms of the three energy insecurity measures. This was for the most part straightforward, though the confidence intervals were somewhat off, as they used a different sample population than I did. I then replicated the results from Table 10, the logistic regression which had a number of demographic and COVID covariates regressed against the three separate measures of energy insecurity. My numbers were generally similar, and pulling out the COVID measures I found that there were some differences, as well as some similarities, again most likely because they used a different subset of the data. Following this, I completed replication of the ordinary least squares regression, finding for the most part similar numbers, though again pulling out the COVID measures, there were some differences, for the reasons detailed above.
What is most interesting to me is the extension that I conducted. It was found that some states had better or worse conditions for those at the threat of energy insecurity. This could be because different states had different COVID policies, as well as energy policies. Some states mandated that their energy providers not disconnect service. As such, it makes sense that for some states, there would be negative correlations for energy insecurity, while others had positive. This is a step in the right direction in terms of teasing out the potential omitted variable bias.
I also found that there is no clear time trend for losing one’s job that results in being energy insecure. This could be because there was a lag for the effects of unemployment, or that unemployment insurance was utilized, or that savings were drawn upon. Energy expenditure is not the first bill to be passed over in times of great hardship, but nor is it the last. However, I must note that there was no clear temporal trend for the month of unemployment regressions, and that, as a result, there were some months that had positive, and some negative, coefficients.
It was also instructive that the original findings of the authors was, for the most part, strengthened by my novel analysis, as they did not choose to do this and readily could have, having the data necessary. Whether during COVID and post COVID changes also line up with these findings I leave to future research.
Part VII: References
Memmott, Trevor; Carley, Sanya; Graff, Michelle; Konisky, David M. Sociodemographic disparities in energy insecurity among low-income households before and during the COVID-19 pandemic, Nature Energy, 6, 186-193 (2021)