5.1 A simple example

Say, you are interested in evaluating the effect of tutoring program initiated following the first exam on grades at an introductory level course. For simplicity, the possible grades are A and B. However, students who received B on their first exam are more likely to attend the tutoring session. In other words, \(P(W_i = 1 | Y_{iFE} = A) < P(W_i = 1 | Y_{iFE} = B)\) (\(Y_{iFE}\) is read as unit \(i's\) grade in the first exam). In this case, the treatment assignment is correlated with the past grade, which can predict the grade on the second exam. In other words, if you did well in the first exam, you are likely to perform well in the second exam and so on. Hence, using equation (2) to estimate effects of the tutoring program will result in biased estimate.

Since we know that the probability of treatment is influenced by the grade on the first exam, we can estimate the conditional average treatment effect (CATE) and average them using weights to form an estimate of ATE. Let’s take a look at the data.

# function to report grade breakdown by the first exam grade (A and B)
grade_mat  <- function(grade_vec){
    grade  <- matrix(0, nrow =2, ncol = 3)
    grade[, 1]  <- c("Treat", "Control")
    grade[ ,2]  <- c(grade_vec[1], grade_vec[2])
    grade[ ,3]  <- c(grade_vec[3], grade_vec[4])
    return(grade)
}

# Y_iFS == A
grade  <- grade_mat(c(5, 9, 2, 4))
colnames(grade)  <- c(" ", "A (2nd Exam)", "B (2nd Exam)")

# Se
grade  %>% kable()  %>% 
    kable_styling(bootstrap_options = "striped", full_width = F, position = "left")  %>% 
    add_header_above(c("Table 1." = 1, "Grade in the 2nd exam | 1st exam = A" = 2))

Table 1.	Grade in the 2nd exam \| 1st exam = A
	A (2nd Exam)	B (2nd Exam)
Treat	5	2
Control	9	4

grade  <- grade_mat(c(15, 1, 5, 4))
colnames(grade)  <- c(" ", "A (2nd Exam)", "B (2nd Exam)")

grade  %>% kable()  %>% 
    kable_styling(bootstrap_options = "striped", full_width = F, position = "left")  %>% 
    add_header_above(c("Table 2." = 1, "Grade in the 2nd exam | 1st exam = B" = 2))

Table 2.	Grade in the 2nd exam \| 1st exam = B
	A (2nd Exam)	B (2nd Exam)
Treat	15	5
Control	1	4

\(~\) \(~\)

Estimation

\(\hat{\tau}_{FE=A} = \frac{5}{7} - \frac{9}{13} = 2.1 \; pp\)

\(\hat{\tau}_{FE=B} = \frac{15}{20} - \frac{1}{5} = 55 \; pp\)

\(\hat{\tau}_{AGG} = \frac{20}{45} \hat{\tau}_{FE=A} - \frac{25}{45} \hat{\tau}_{FE=B} = 31.48 \; pp\).

The first two are CATEs for the group that recived A and B in the first exam. The assumption is that once conditioned on the grade in the first exam, treatment (who attends vs. who doesn’t) is random. This allows valid estimation of within group causal effects, which are then averaged to form ATE using appropriate weights on the third line. This simple example using the discrete feature space (grade in the first exam can be A or B) provides intuition that if variables influencing the treatment assignment are observed then ATE estimate can be uncovered by taking weighted average of CATE estimates (these are also group-wise ATE).⁵

In this case, CATEs are different across the two sub-groups. Sometimes the core interest of analysis can be uncovering the heterogeneous treatment effects, which motivates estimation and inference on CATEs across two or more sub-groups.↩︎