5 IPW and AIPW
The target is to estimate the average treatment effect (ATE):
\[\begin{equation} \label{eq:ATE} ATE = E[Y_i(1) - Y_i(0)] \tag{5.1} \end{equation}\]
Note that using the following two assumptions:
\(W_i \perp \{Y_i(0), \; Y_i(1)\}\) (independence assumption)
\(Y_i(W) = Y_i\) (SUTVA)
the ATE estimate \(\hat{\tau}\) can be written as the difference-in-means estimator:
\[\begin{equation} \label{eq:ATE_estimator} \hat{\tau} = \frac{1}{N_T} \sum_{W_i = 1} Y_i - \frac{1}{N_C} \sum_{i \in W_i = 0} Y_i \end{equation}\]
where \(N_T\) and \(N_C\) are the number of treated and control units, respectively.
In the previous lecture, we disscussed randomized control trial as an ideal approach to estimate ATE. In a randomized controlled trial each unit has an equal probability of receiving the treatment. This means the following:
\[\begin{equation} P(W_i = 1 \; | \; Y_i(0), \; Y_i(1), \; n_T) = \frac{n_T}{n}, \; \; i = \{1, ...., n\} \tag{5.2} \end{equation}\]
In equation (5.2), \(n_T\) refers to the number of units that receives the treatment.3 In an easy to understand set-up, if a researcher wants \(P(W_i = 1) = 0.5\) (unit is equally likely to be treated or untreated), a coin flip can feasibly be used as a mechanism to assign treatment.4
Although randomized controlled trials (RCTs) are often considered the gold standard in causal inference, they cannot always be used due to ethical, moral, and monetary reasons. Returning to the example we used in the previous chapter, it is not ethical to demarcate who can attend the tutoring session versus who cannot. In real-world scenarios, tutoring sessions are typically voluntary. Students who regularly attend these sessions may have different baseline (pre-treatment) characteristics compared to those who do not attend. These differences can introduce biases that complicate causal inference in observational studies.
To proceed further in observational setting (without using RCTs), we require more knowledge about the treatment assignment. In other words, we need to understand which variables determine who attends the tutoring sessions. This information is crucial for identifying potential confounders and for applying methods that can help estimate causal effects in observational settings. In causal inference, confounders are variables that are associated with both the treatment and the outcome. They can introduce bias in the estimation of the causal effect of the treatment on the outcome by providing alternative explanations for any observed relationships. For example, say you are trying to evaluate the efficacy of a new drug on blood pressure level. If smokers are more likey to get treated and if they tend to have higher blood pressure to begin with, the treatment effects are likely to be understated.
This brings us to the unconfoundedness assumption.
Unconfoundedness: The treatment assignment is as good as random once we control for \(X\)s.
\[\begin{equation} \{W_i \perp \{Y_i(0), \; Y_i(1)\} | X_i \} \; for \; all \; x \in \chi. \tag{5.3} \end{equation}\]
As with the tutoring example, the independence assumption (discussed in the previous chapter) is highly unlikely to hold in observational settings. Let’s consider the following scenarios:
- Out of the ten states that are yet to expand Medicaid, eight fall in South. Medicaid expansion is not random.
- Cigarette taxes are higher in states with higher anti-smoking sentiments.
- Infrastructure development, such as construction of roads, schools, hopitals, are demand-driven.
- The list goes on ..
However, if we manage to observe all the \(X\)s (covariates) that influence the treatment, we can invoke unconfoundedness for causal inference.
Although it is generally recommended to assign half of the sample to the treatment group and the other half to the control group, this is not a strict requirement.↩︎
Of course, this is quicky going to be inefficient as the sample size increases. In general, treatment assignment is determinted by a statistical process via a software. For example, if a researcher wants about one-third of the sample treated then a bernoulli trial with the probability of success of 0.33 can be used.↩︎