8.1 Some ways to estimate CATE
Robinson’s partially linear model for homogeneous treatment effect is written as:
\(Y_i = \tau W_i + f(X_i) + \epsilon_i \; ........(equation \; 1)\)
Here, \(\tau\) is assumed constant across sub-spaces of \(X\). We can expand to write Robinson’s partially linear model as:
\(Y_i = \tau(X_i) W_i + f(X_i) + \epsilon_i \; ........(equation \; 2)\)
where, \(\tau(.)\) varies with \(x\).
Equation 2 can be expressed as residual-on-residual regression format of:
\(Y_i - m(X_i) = \tau(X_i) (W_i - e(X_i)) + \epsilon_i \; ........(equation \; 3)\)
where, \(m(x)\) is the conditional expectation of \(Y\) given \(X\).
\(m(x) = E[Y_i | \; X_i = x] = \mu_{W = 0}(X_i) + \tau(X_i) e(X_i)\), where \(\mu_{0}(X_i)\) is the baseline conditional response (in absense of treatment) and \(e(x) = P(W_i = 1 | \; X_i = x)\).11
\(\tau(X)\) is parameterized as \(\tau(x) = \psi(x).\beta\), where \(\psi\) is some pre-determined set of basis functions: \(\chi \rightarrow R^k\).
A feasible loss function can be devised using equation 3 and using estimates of \(m(x)\) and \(e(x)\) from cross-fitting.
\(L = \frac{1}{n} \sum_{i = 1}^n((Y_i - \hat{m}(X_i)^{-k(i)}) - (W_i - \hat{e}(X_i)^{-k(i)}) \; \psi(X_i).\beta)^2\). Note that the parameter of interest is \(\beta\).
LASSO can be used to estimate \(\hat{\beta}\), where:
\(\hat{\beta} = argmin_{\beta}\{L + \lambda \; ||\beta||_{1}\}\), where \(\lambda\) is the regularizer on the complexity of \(\tau(.)\).12
Note: The other approach is to use random forest to measure out weight of an observation \(i\) in relation to the test point \(x\). This approach is done using causal forest in the Generalized Random Forest framework.
The distinction between \(m(x)\) and \(m(X_i)\) is such that the former is estimation performed at the new data point \(x\).↩︎
One can build a highly complex model and improve the in-sample fit. However, this model may perform badly while predicting out-of-sample cases. As such, the complexity of the model should be penalized while training the model.↩︎