8 Heterogeneous Treatment Effects

This article summarizes heterogeneous treatment effects using ML.

Simply put, its defined as the variation in response to treatment across several subgroups. For example, the impacts of Medicaid expansion on labor market outcomes can vary depending on uninsured rate prior to the expansion; the effects of discussion intervention program aimed to normalize disscussion regarding menstruation can increase demand for menstrual health products at a higher rate among those with high psychological cost in the baseline; in personalized medical treatment, we would want to identify the sub-group with higher response to a particular type of treatment.

It is different from average treatment effect (ATE) such that the ATE focuses on the whole group, while heterogeneous treatment effect pertains to the specific sub-group characterized by features (\(X\)s). In this sense, one can think of ATE as the weighted average of subgroup specific ATEs.

Using the potential outcome framework, ATE is given by: \(E[Y_i^{1} - Y_i^{0}]\).

The heterogeneous treatment is: \(E[Y_i^{1} - Y_i^{0} | X_i = x ]\). Its the treatment conditional on \(X_i\), which is determined prior to observing the data. Hence, its also termed as the conditional average treatment effect (CATE).

One simple example borrowed from Wager’s lecture notes to illustrate the concept is that of smoking in Geneva and Palo Alto. Say, two RCTs are conducted in Palo Alto and Geneva to evaluate whether cash incentives among teenagers can reduce the prevalence of smoking.

# Palo-Alto

smoke_mat  <- function(smoke_vec){
    smoke  <- matrix(0, nrow =2, ncol = 3)
    smoke[, 1]  <- c("Treat", "Control")
    smoke[ ,2]  <- c(smoke_vec[1], smoke_vec[2])
    smoke[ ,3]  <- c(smoke_vec[3], smoke_vec[4])
    return(smoke)
}

smoke  <- smoke_mat(c(152, 2362, 5, 122))
colnames(smoke)  <- c("Palo Alto", "Non-S.", "Smoker")

data.frame(smoke)  %>% kable()  %>% 
    kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
Palo.Alto Non.S. Smoker
Treat 152 5
Control 2362 122
smoke  <- smoke_mat(c(581, 2278, 350, 1979))
colnames(smoke)  <- c("Geneva", "Non-S.", "Smoker")

data.frame(smoke)  %>% kable()  %>% 
    kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
Geneva Non.S. Smoker
Treat 581 350
Control 2278 1979

\(\hat{\tau}_{PA} = \frac{5}{152+5} - \frac{122}{2362 + 122} \approx -1.7 pp\)

\(\hat{\tau}_{GVA} = \frac{350}{581+350} - \frac{1979}{2278 + 1979} \approx -8.9 pp\)

\(\hat{\tau} = \frac{2641}{2641 + 5188}\tau_{PA} + \frac{5188}{2641 + 5188}\tau_{GVA}\).

Here, \(\hat{\tau}_{PA}\) is an estimate of \(E[smoke \;prevalence | \; W = 1, \; X = PA] \; - \; E[smoke \;prevalence | \; W = 0, \; X = PA]\), and its the treatment effect particular to Palo Alto. The average treatment effect \(\hat{\tau}\) is the weighted average of the two treatment effects.