4.8 Standard Errors

Standard Error of the Regression.

Is the just the square root of the variance of the error term.

Using a simple regression the standard error of the regression can be attained by the following.

  1. Write down the variance of the error term.

    • \(\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} \epsilon_i\)
  2. Note that the variance is not feasible as we don’t observe \(\epsilon_i\). We’d want to get the feasible variance estimator \(\widehat{\sigma}^2\) by replacing the \(\epsilon_i\) by the residual, i.e. \(\widehat{\epsilon_i}\). This gives us:

    • \(\widehat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^{n} \widehat{\epsilon_i}\)
  3. Note that the variance estimate from 2 is biased downwards. This is because we have estimated \(p\) number of parameters before getting the residuals. The remaining degree of freedom is \(n-p\). So instead dividing the sum of the square of residuals by \(n\), we divide it by \(n-p\).

    • \(\widehat{\sigma}^2 = \frac{1}{n-p} \sum_{i=1}^{n} \widehat{\epsilon_i}\)

Here, \(\widehat{\sigma}^2\) is the estimated variance of the error term. The square root of it is the standard error of the regression. It tells us on average how far away is the actual observation (\(Y\)-value) from the regression line (best-fit line or \(E(Y|X)\)). Lower the standard error of the regression, better the fit.

Let’s estimate the standard error of the regression.

set.seed(125)
n  <- 1000
educ  <- sample(seq(1, 16, 1), n, replace = TRUE)
income  <- 20000 + 2000 * educ + rnorm(n, mean = 10000, sd = 5000)

dat  <- data.frame(educ = educ, income = income)

# estimate the model
reg_model  <- lm(income~educ, data = dat) 

# get the residuals
resid  <- residuals(reg_model) 

# sum of the square of residuals 
ssr  <- sum(resid^2)

# divide ssr by n - p 
var_error  <- ssr / (n - length((coef(reg_model))))

# get se 
se = sqrt(var_error)

print(se)
## [1] 4992.835
# compare to the regression output
summary(reg_model)$sigma
## [1] 4992.835

Standard error of the coefficient

Once we’ve estimated the standard error of the regression, we can go ahead estimate the standard error of the coefficient.

  • \(\widehat{var}(\hat{\beta}) = \frac{\widehat{\sigma}^2}{var(X) \times (n - 1)}\)

Let’s estimate the standard error of income coefficient.

# estimate the variance of education coefficient
var_educ  <-  se^2 / (sd(educ)^2 * (n - 1))

# get the standard error
se_educ  <- sqrt(var_educ)

# print
print(se_educ)
## [1] 34.63766
# compare the hand-estimated se with the model estimate
se(reg_model)[[2]]
## [1] 34.63766