5.3 Propensity score

Previously we discussed the setting of a discrete feature in which case we estimate group-wise ATEs and use the weighted average to obtain an overall ATE estimate. When there are many features (covariates), this approach is prone to the curse of dimensionality.6 Moreover, if features are continuous, we won’t be able to estimate ATE at each value of \(x \in \chi\) due to lack of enough sample size. Instead of estimating group-wise ATE and averaging them, we would want to use a more indirect approach. This is when propensity score comes in.

The implicit assumption is that we have collected enough features (discrete, continuous, interaction terms, higher degree polynomials) to back unconfoundedness. This again means that the treatment assignment is as good as random after controlling for \(X_i\). More formally, this us back to equation (5.3). But in actuality we are not interested in splitting groups to estimate group-wise treatment effects in the case when covariates are continuous and there are many characteristics determining the treatment assignment.

Propensity score: \(e(x)\). The probability of being treated given a set of covariates \(X\)s.

\[\begin{equation} e(x) = P(W_i = 1 | X_i = x) \tag{5.7} \end{equation}\]

The key property of the propensity score is that it balances units in the treatment and control groups. If unconfoundedness assumption holds, we can write the following:

\[\begin{equation} W_i \perp \{Y_i(0), \; Y_i(1)\} | \; e(X_i) \tag{5.8} \end{equation}\]

What equation (5.8) says is that instead of controlling for \(X\) one can control for the probability of treatment \((e(X))\) to establish the desired property that the treatment is as good as random. The propensity scores are mainly used for balancing purposes.

One straight-forward implication of equation (5.8) is that if we partition observations into groups with similar propensity score then we can estimate group-wise treatment effects and aggregate them to form an estimate for ATE. This can be done using the propensity score stratification method. The argument here is that when units with similar propensity scores are compared, the covariates are approximately balanced, mimicking a randomized experiment.


  1. As the number of covariates increases the domain space shrinks quite rapidly making it infeasible to estimate ATE within the given domain due to thinning out data.↩︎