6.8 Some concerns with controls

Ok, we sort of argued that conditional DiD may perform better in the real world. Like everything, this does not come easily.

Here are some concerns regarding including covariates.

First of all, we don’t always clearly know what to control for in the real world. I previously made the case that a state’s political leaning is an important variable and one should account for it. However, there might be other variables that I’m completely missing out. We can use economic reasoning, past studies in the literature, as well as data based methods (e.g., double lasso for variable selection) to decide on controls.
There are good controls and there are bad controls. Let’s say you think its important to improve comparability between the treated and control units. To do so, you pick income in 2014 as a control. This, I’d argue, is an example of a bad control. Why? Its because income in 2014 might be affected by the ACA-Medicaid expansion, which can lead to bias in the estimate of interest. We’d want to make sure that the controls are not directly affected by the reform itself. This is why mostly researchers rely on pre-treatment variables rather than post-treatment variables as controls.
Earlier, we looked at the case of a binary variable (Republican vs. Democrat Governor in 2013) as a covariate. However, in this setting, if the number of control increases, then the sample space thins out fairly quickly. Say, we add the following binary controls: \(i)\) urban|rural, \(ii)\) south|non-south, \(iii)\) high|low uninsured unit based on 2013 (baseline) uninsured rates. Here, we’d have \(2^{4}\) different splitting of the sample. If you decide to add in more controls, the number of subgroups will increase exponentially. Note that we’ve only considered binary controls so far. This issue worsens if you add in continuous variables as controls. This is known as the curse of dimensionality.

There are several ways to avoid this curse. One relatively less taxing approach is to incorporate controls in the regression format. However, this leads to its own issues. Firstly, the covariates are being linearly incorporated in the regression, which leads to a linear functional form assumption. Second, if the effect of the reform varies along the covariate, then this might lead to a bias on the estimate of interest. Hence, we’d want to incorporate controls in a more flexible way using the inverse probability weighting for DiD or Doubly Robust framework tailored for DiD. But more on this later!