4.4 Error term
Say, that the model that you have specified is correct; i.e., \(Y\) is explained only by \(X\). It is easy to see that the error term will consist of the variation in \(Y\) that is unexplained by \(X\). In this case, the error term will just consist of noise.
One widely used assumption is that the errors are: \(i)\) identically and independently distributed; and \(ii)\) comes from the normal distribution with mean 0 and variance \(\sigma^2\), i.e., \(\epsilon \sim (0, \; \sigma^2)\). This is a strong assumption for two reasons. First, the error terms might be correlated. For example, individuals who live close each other in terms of spatial proximity might share something in common. Second, variance in the error term might vary with values on \(Xs\). To see this, earnings may have larger variance for individuals with higher levels of education, compared to those with lower levels of education. When the variance of the error term is not constant across all levels of the explanatory variables, this condition is known as heteroskedasticity. It violates one of the key assumptions of the Ordinary Least Squares (OLS) regression model, which assumes homoskedasticity, i.e., the error term has a constant variance.
Instead of assuming that $E() = 0 $, we’ll utilize the exogeneity assumption to make sense of regression. This assumption implicitly states that we’ve closed all of the backdoors to “bad pathways” in the DGP. Let’s say that we’ve got the following DGP.
library(dagitty) # libraries for DAG
library(ggdag)
# Define a causal diagram
<- dagitty("
dag dag {
health -> educa
educa -> earn
health -> earn
race -> educa
race -> earn
height -> earn
}
")
# Visualize the DAG
ggdag(dag) +
theme_minimal() +
ggtitle("Causal Diagram Example A.") + theme_void()
In this DGP, channels from health and race are bad pathways and need to be accounted for in order to evaluate the causal effect of education on earnings. Once we have accounted for all of the bad and of course good pathways, we can see that height only affects earnings but no other variables. Since, height is not correlated to the explanatory variables \((Xs)\), we term height as an exogeneous variable. The exogeneity assumption states that the error term \((\epsilon\))$ is uncorrelated with the explanatory variables \(Xs\) included in the model. This means that after accounting for \(Xs\), the error term is independent of \(Xs\). For instance, in the given DGP process, once accounting for education, health, and race, the unaccounted variation in earnings is not correlated with these variables. This conditional independence of error term implies that
\(E(u|Xs) = 0\).
Next, we can establish this following using the Law of Iterated Expectation:
\[\begin{align} E(uX) = E[E[uX|X]] = E[X\underbrace{E[u|X]}_{=0}] \\ = 0 \end{align}\]
So we’ve got two population-related assumptions:
\(E(u) = 0\)
\(E(uX) = 0\)
Note that there are two unknowns (\(\alpha\) and \(\beta\)). Let’s first set-up the sample counterparts:
\(\frac{1}{n} \sum_{i}^{n}(Y_i - \hat{\alpha} - \hat{\beta} X_i) = 0\)
\(\frac{1}{n} \sum_{i}^{n} X_i(Y_i - \hat{\alpha} - \hat{\beta} X_i) = 0\)
Solve for \(\hat{\alpha}\)
\[\begin{align} \frac{1}{n} \sum_{i}^{n}(Y_i - \hat{\alpha} - \hat{\beta} X_i) = 0 \\ \hat{\alpha} = \hat{Y} - \hat{\beta} \hat{X} \end{align}\]
replace, the value of \(\hat{\alpha}\) into the second equation to get:
\[\begin{align} \frac{1}{n} \sum_{i}^{n} X_i(Y_i - \hat{\alpha} - \hat{\beta} X_i) = 0 \\ \frac{1}{n} \sum_{i}^{n} X_i(Y_i - \hat{Y} + \hat{\beta} \hat{X} - \hat{\beta} X_i) = 0 \\ \frac{1}{n} \sum_{i}^{n} X_i(Y_i - \hat{Y}) = \hat{\beta} \frac{1}{n} \sum_{i}^{n}X_i (X_i - \hat{X}) \\ \frac{1}{n} \sum_{i}^{n} (Y_i - \hat{Y}) = \hat{\beta} \frac{1}{n} \sum_{i}^{n} (X_i - \hat{X}) \\ \frac{1}{n} \sum_{i}^{n} (Y_i - \hat{Y})(X_i - \hat{X}) = \hat{\beta} \frac{1}{n} \sum_{i}^{n} (X_i - \hat{X})^2 \\ \hat{\beta} = \frac{cov(X,Y)}{var(X)} \end{align}\]