5.6 Propensity score stratification

Propensity scores are super important as they can be used in various different approaches to enchance the validity of causal inference in observational settings. These include but are not limited to inverse probability weighting, matching estimates, weight adjustments in regression (for better balancing), trimming, and propensity score stratification. These methods will be discussed in detail as we move on with the course. First, let’s take a look at propensity score stratification to get a gist of how propensity scores contribute in comparing treatment units with control units.

The simple idea is given by the cliché that we want to compare oranges with oranges and not apples. To bring focus back into our context, it simply means that it is no good comparing a treated unit with an extremely high probability of receiving the treatment with a control unit with super low probability of receiving the treatment. But what if (yes, what if) we compare units with similar treatment probabilities?

Let’s run a quick thought experiment. We run the logistic regression and estimate the propensity score. Say, we have two units, each from the treatment and control group, with the propensity score of 0.6. The assumption here is, conditional on the similar propensity score, the treatment assignment is random. This follows from the unconfoundedness assumption: \(Y_i^{0}, \; Y_i^{1} \; \perp \; W_i \; | X_i\).

Propensity score stratification divides the estimates of propensity scores into several segments and estimates the ATE within each segment. Finally, these segment-specific ATE estimates are averaged to obtain the overall estimate of ATE.

Steps for ATE estimation using propensity score stratification

Order observations according to their estimated propensity score.

\(\hat{e}(X)_{i1}, \; \hat{e}(X)_{i2}, ... \; \hat{e}(X)_{iN}\)

Form \(J\) strata of equal size and take the simple difference in mean between the treated and control units within each strata. These are \(\hat{\tau}_j\) for \(j = \{1, \; 2, \; ..., \; N\}\).
Form the ATE,

\(\hat{\tau}_{Strat} = \frac{1}{J} \sum_{j = 1}^{J} \hat{\tau}_j\)

Here, \(\hat{\tau}_{Strat}\) is consistent for \(\tau\), meaning that \(\hat{\tau}_{Strat} \rightarrow_p \tau\) given that \(\hat{e}(x)\) is consistent for \(e(x)\) and the number of strata grows appropriately with \(N\). However, one needs to set the number of strata, which can be a bit ad-hoc.

Demo of propensity score stratification

# order data by the propensity score: low to high
dat  <- dat[order(dat$pscore), ]

# cut to form ventiles
strata  <- cut(dat$pscore, breaks = quantile(dat$pscore, seq(0, 1, 0.05)), labels = 1:20, include.lowest = TRUE)

dat  <-  dat  %>%  
            mutate(strata = strata)

# compare across strata
dat_sum  <- dat  %>%  
                group_by(W, strata)  %>%  
                summarize(mean_Y = mean(Y))  %>%  
                pivot_wider(names_from = W, values_from = mean_Y)

## `summarise()` has grouped output by 'W'. You can override using the `.groups` argument.

colnames(dat_sum)  <-  c("strata", "mean_control", "mean_treat")
dat_sum  <- dat_sum  %>%  
                mutate(diff = mean_treat - mean_control)

print(paste("ATE Estimation from propensity score stratification is: ", mean(dat_sum$diff), sep = ""))

## [1] "ATE Estimation from propensity score stratification is: 2.50135397640408"

print(paste("raw difference is :", mean(dat$Y[dat$W == 1]) - mean(dat$Y[dat$W == 0]), sep = ""))

## [1] "raw difference is :3.04098842187519"

print(paste("And the true treatment effect is :", true_effect, sep = ""))

## [1] "And the true treatment effect is :2.5"

We see that the estimate from stratification gets closer to the true effect compared to the mean difference estimator. Looks like given that we know and observe what variables determine the treatment assignment, propensity score stratification approach performs well in estimating the ATE.