6.16 Assumptions governing TWFEDD estimate

The TWFEDD estimate measures weighted average of all possible \(2 \times 2\) DD average treatment effects on treated. In the case of groups being defined as treatment timing \(k,\;g,\;U\), the \(2\times 2\) DD estimates can be written as:

\[\begin{equation} \hat{\alpha}^{2\times 2}_{kU} = [\bar{y}_k^{POST(k)} - \bar{y}_k^{PRE(k)}] - [\bar{y}_U^{POST(k)} - \bar{y}_U^{PRE(k)}] \end{equation}\]

\[\begin{equation} \hat{\alpha}^{2\times 2 k}_{kl} = [\bar{y}_k^{MID(k,l)} - \bar{y}_k^{PRE(k)}] - [\bar{y}_l^{MID(k,l)} - \bar{y}_l^{PRE(k)}] \end{equation}\]

\[\begin{equation} \hat{\alpha}^{2\times 2 l}_{kl} = [\bar{y}_l^{POST(l)} - \bar{y}_l^{MID(k,l)}] - [\bar{y}_k^{POST(l)} - \bar{y}_k^{MID(k,l)}] \end{equation}\]

Now, let us express the estimates based on the counterfactuals. First, write

\[\begin{equation} y_{it} = D_{it}Y_{it}(t_i) + (1-D_{it})Y_{0} \end{equation}\]

where, \(Y_{it}\) is the outcome of unit \(i\) in time \(t\) and \(Y_{0}\) is the counterfactual outcome. Following (callaway2022?) define ATT for group \(k\) at time period \(\tau \geq k\) as \(ATT_{k}(\tau) = E[Y_{i\tau}(t^{*}_{k}) - Y_{i\tau}(0)|t_i = k]\) Now, let us define \(W\) as the date range or windows with \(T_W\) periods.

In practice, \(W\) represents the post treatment window in \(2 \times 2\) DD. But note that there are \(T_{W}\) periods. In our case above, \(W\) for group \(k\) represents the \(MID(k,l)\) plus the \(POST(l)\) windows and \(T_{W} = 2\). Group \(k\) is treated in two windows – \(MID(k,l)\) and \(POST(l)\); hence, the \(ATT_{k}(W)\) is just the average of ATTs across the windows.

\[\begin{equation} ATT_{k}(W) = \frac{1}{T_{W}} \sum_{t \in W} E[Y_{it}(k)-Y_{it}(0)|t_{i}=k] \end{equation}\]

Now, define the change in average untreated potential outcome between pre and the post period as:

\[\begin{equation} \Delta Y_{k}^{0}(W_1, W_0) = \frac{1}{T_{W_1}} \sum_{t \in W_1} E[Y_{it}(0)|t_{i}=k] - \frac{1}{T_{W_0}} \sum_{t \in W_1} E[Y_{it}(0)|t_{i}=k] \end{equation}\]

Using this notation, the \(2 \times 2\) \(\hat{\beta}\)s can be written as:

\[\begin{equation} \hat{\alpha}_k^{2\times 2} = ATT_{k}^{(POST(k))} + \overbrace{[ \Delta Y_{k}^{0}(POST(k),PRE(k)) - \Delta Y_{U}^{0}(POST(k),PRE(k))}^{parallel\;trend}] \end{equation}\]

\[\begin{equation} \hat{\alpha}_{kl}^{2\times 2 k} = ATT_{k}^{(MID(k,l))} + \overbrace{[ \Delta Y_{k}^{0}(MID(k,l),PRE(k)) - \Delta Y_{U}^{0}(MID(k,l),PRE(k))}^{parallel\;trend}] \end{equation}\]

\[\begin{equation} \hat{\alpha}_{kl}^{2\times 2 l} = ATT_{l}^{(MID(k,l))} + \overbrace{[ \Delta Y_{l}^{0}(POST(l),MID(k,l)) - \Delta Y_{k}^{0}(POST(l),MID(k,l))}^{parallel\;trend}] + [ATT_k(MID(k,l))-ATT_k(POST(l))] \end{equation}\]

While \(\hat{\alpha}_{k}^{2\times 2}\) and \(\hat{\alpha}_{k}^{2 \times 2}\) depends on the parallel trend assumption, \(\hat{\alpha}_{k}^{2 \times 2}\) also depends on the difference between group \(k's\) \(ATT\) in \(MID(k,l)\) and \(POST(l)\). This is because the late treatment group is compared also with the early treatment group, and if there is presence of treatment dynamic in early treated group, this will show up in \(\hat{\alpha}_{kl}^{2\times 2 l}\). Substituting the expressions of \(\hat{\alpha}_{k}^{2\times 2}\), \(\hat{\alpha}_{k}^{2 \times 2}\), and \(\hat{\alpha}_{kl}^{2\times 2 l}\) into the TWFE decomposition yields the following:

\[\begin{equation} \plim_{N \to \infty} \hat{\alpha} = \alpha = VWATT + VWCT + \Delta ATT \end{equation}\]

where, VWATT is the variance weighted average treatment effect on the treated; VWCT is the variance weighted common trends; and \(\Delta ATT\) is the change in average treatment effect on treated of group \(k\) between the \(Mid(k,l)\) and \(Posk(l)\) period (treatment effect dynamics or heterogeneity over time).

An intuition is that parallel trend assumption justifies comparing treated vs. untreated (or not yet treated groups), and deviation in pathways of outcome can be attributed to treatement. As such, (callaway2022?) refers to these groups (untreated and not yet treated) as “good comparison” groups. Now, early treatment group, that serves as the comparison group for the late treated group, can be a “bad comparison” if the treatment effect (of early treated group) varies with time.