3.2 Average treatment effect (ATE)

Our target is to estimate the effects of the treatment.

For a brief moment, let’s assume the presence of a parallel universe that includes Alia, Ryan, Shrey, Samaira, and Rakshya in the course. In one universe (actual universe) the treatment for these individuals are randomly allocated: \(W_{Alia} = 1\), \(W_{Ryan} = 0\), \(W_{Shrey} = 0\), \(W_{Samaira} = 1\), \(W_{Rakshya} = 0\). In the other (parallel) universe, everything is similar to the actual universe except that the treatment status is exactly opposite. In this case, individual specific treatment effect can be estimated by taking the difference in individual specific outcomes across two universes. For example, the treatment effect for Alia is \(Y_{Alia}(1) - Y_{Alia}(0).\) This is feasible since a perfect counterfactual is available for all the units given the parallel universe. The average of such individual treatment effect gives the average treatment effect, ATE.

The target is to estimate average treatment effect (ATE), which is defined as:

\[\begin{equation} ATE = E(Y_i(1)) - E(Y_i(0)) \end{equation}\]

\(Y_i(1)\) denotes the outcome for an unit \(i\) in presence of treatment, whereas \(Y_i(0)\) is the realization for the same unit \(i\) in absence of the treatment. As we know, it is impossible to measure the unit \(i\) in two different states (with and without treatment).

A major difficulty is that one cannot observe units simultaneously with and without treatment in reality. This means that the perfect counterfactual does not really exist. This again emphasizes causal inference as a missing data problem – when estimating the treatment effect of an unit \(i\), \(Y_i(0)\) is not observed if \(Y_i(1)\) is and vice-versa. This unfortunately does not allow us to estimate individualized treatment effect. The best we can do (as of yet) is use the independence assumption as well as overlap assumption together and evalute ATE.

Note that the independence condition, $ W_i Y_i(0), ; Y_i(1)$, gives: \(E(Y_i(0)|W_i = 1) = E(Y_i(0)|W_i = 0)\). The term, \(E(Y_i(0))\), in ATE equation is replaced by \(E(Y_i|W_i = 0)\).

In the case of a pure randomized experiment, the ATE is given as:

\[\begin{equation} ATE = E(Y_i | W_i = 1) - E(Y_i | W_i = 0) \end{equation}\]

ATE evaluates treatment effect for the whole population by comparing the treated units to the control units.