6.15 What is TWFE Estimating when there is Treatment Timing Variation?

Using (goodman2021?)’s decomposition the TWFEDD estimate \(\hat{\alpha}^{DD}\) can be written as:

\[\begin{equation} \hat{\alpha}^{DD} = \sum_{k \neq U} s_{ku} \hat{\alpha}^{2 \times 2}_{k} + \underbrace{\sum_{k \neq U} \sum_{l>k}[s^k_{kl}\hat{\alpha}_{kl}^{2 \times 2,k}+ s^l_{kl}\hat{\alpha}_{kl}^{2 \times 2,l}]}_\text{timing only estimator}....TWFE(decomposition) \end{equation}\]

Here,

\(\hat{\alpha}^{2 \times 2}_{kU} = [\bar{y}_k^{Post(k)}-\bar{y}_k^{Pre(k)}] - [\bar{y}_U^{Post(k)}-\bar{y}_U^{Pre(k)}]\); this is when group \(k\) is compared to untreated group \(U\).

\(\hat{\alpha}^{2 \times 2, k}_{kl} = [\bar{y}_k^{MID(k,l)}-\bar{y}_k^{Pre(k)}] - [\bar{y}_l^{MID(k,l)}-\bar{y}_l^{Pre(k)}]\); this is when early group \((k)\) is compared to late treated group \((l)\) during the period when group \(l\) is not yet treated.

\(\hat{\alpha}^{2 \times 2, l}_{kl} = [\bar{y}_l^{POST(l)}-\bar{y}_k^{MID(k,l)}] - [\bar{y}_k^{POST(l)}-\bar{y}_k^{MID(k,l)}]\); this is when late group \((l)\) is compared to early treated group \((k)\) using the window between \(MID(k,l)\) and \(POST(l)\) when the treatment status of group \((k)\) does not change. Here, early treated group is being used as the control group.

Note that the second block in TWFE(decomposition) uses variation in timing of treatment to identify the effects. Each group serves as the control to the other during the window when the treatment status do not change.

\(s_{ku}\), \(s_{kl}^{k}\), and $s_{kl}^{l} are the weights placed on the estimates that compares: \(i)\) treated to untreated units (giving rise to 2 \(2 \times 2\) DD estimates); \(ii)\) early treated to late treated in between \(PRE(k)\) and \(MID(k,l)\) window; and \(iii)\) late treated to early treated between \(MID(k,l)\) to \(POST(l)\) window, respectively. (goodman2021?) presents the following interpretation for the weights:

\[\begin{equation} s_{kU} = \frac{(n_k + n_U)^2 \overbrace{n_{kU}(1-n_{kU})\bar{D}_k(1-\bar{D}_k)}^{\hat{V}^{D}_{kU}} }{\hat{V}^{D}} \end{equation}\]

\[\begin{equation} s_{kl}^{k} = \frac{((n_k + n_l)(1-\bar{D}_l))^2 \overbrace{n_{kl}(1-n_{kl})\frac{\bar{D}_k-\bar{D}_l}{1-\bar{D}_l}\frac{1-\bar{D}_k}{1-\bar{D}_l}}^{\hat{V}^{D,k}_{kl}} }{\hat{V}^{D}} \end{equation}\]

and

\[\begin{equation} s_{kl}^{l} = \frac{((n_k + n_l)\bar{D}_k)^2 \overbrace{n_{kl}(1-n_{kl})\frac{\bar{D}_l}{\bar{D}_k}\frac{\bar{D}_k-\bar{D}_l}{\bar{D}_k}}^{\hat{V}^{D,l}_{kl}} }{\hat{V}^{D}} \end{equation}\]

where,

\[\begin{equation} \sum_{k \neq U} s_{ku} + \sum_{k \neq l} \sum_{l>k}(s_{kl}^{k} + s_{kl}^{l}) = 1 \end{equation}\]

The TWFE estimate depends on weight implied to each of the \(2\times 2\) DD estimate. The weights depend on the sample size of the group that is treated as well as the untreated group. Note that the weight also depends on the variance of the subsample based on the treated vs untreated groups. For instance, \({\hat{V}^{D,k}_{kl}}\) denotes the variance in \(D_i\) for the subsample defined by groups \(k\) and \(l\), for the period \(Pre\) and \(Mid(k,\;l)\).

Let us take a look at \(\bar{D}_k(1-\bar{D}_k)\), \(\frac{\bar{D}_k-\bar{D}_l}{1-\bar{D}_l}\frac{1-\bar{D}_k}{1-\bar{D}_l}\), and \(\frac{\bar{D}_l}{\bar{D}_k}\frac{\bar{D}_k-\bar{D}_l}{\bar{D}_k}\) more closely. It is seen that these values are maximized when treatment occurs at the middle of the time window the researcher uses. In other words, the TWFE estimate depends on when the treatment occurs in the panel; if there is heterogeneous effects between groups, these effects are going to be emphasized (or de-emphasized) depending on wherein the given time window the treatment falls.

This can be explained using a simulation that uses treatment effects of 10 and 15 for the early and late treated groups, respectively. The timing window comprise of 20 periods; the treatment timing of the early treated group is fixed at period 9, whereas the treatment timing for the late treated group is allowed to vary backwards from period 16 to 10. The figure shows that the higher treatment effect for the late treated group is supressed when the treatment timing is towards the end of the panel; the treatment effect increases as the treatment period for the late treated group approaches to the middle of the panel. When the treatment for the late period occurs at the middle of the panel (period 10), the estimate is very close to 12.5 – the average effect of early and late treated groups. Note that \(K\) timing group yields \(K^2 - K\) \(2\times 2\) “timing-only” DD estimates (\(\hat{\beta}^{2\times 2 k}_{kl}\) or \(\hat{\beta}^{2\times 2 l}_{kl}\)); one untreated unit (throught out the time window) yields \(K^2\) DD estimates.