4.9 An exercise

setwd("/Users/vshrestha/Dropbox/Machine Learning/book/data")

# arrow to open large data sets
library(arrow)

# birthweight data for 2000
bw_data2000  <- read_feather( "NCHS_birthweight2000.feather")
head(bw_data2000)
## # A tibble: 6 × 47
##   datayear pldel birattnd statenat cntynat stoccfip cntocfip stateres cntyres cityres
##      <dbl> <dbl>    <dbl> <chr>    <chr>   <chr>    <chr>    <chr>    <chr>   <chr>  
## 1     2000     1        1 33       33028   36       36059    33       33028   999    
## 2     2000     1        1 47       47040   51       51059    47       47103   999    
## 3     2000     1        1 36       36043   39       39085    36       36999   999    
## 4     2000     1        2 23       23033   26       26065    23       23033   999    
## 5     2000     1        1 05       05001   06       06001    05       05001   103    
## 6     2000     1        1 10       10050   12       12099    10       10050   999    
## # ℹ 37 more variables: dmage <dbl>, mrace <dbl>, dmeduc <dbl>, dmar <dbl>, mplbir <dbl>,
## #   nlbnd <dbl>, totord9 <dbl>, monprec <dbl>, nprevis <dbl>, dfage <dbl>, frace <dbl>,
## #   birmon <dbl>, dgestat <dbl>, gestat10 <dbl>, csex <dbl>, dbirwt <dbl>, fmaps <dbl>,
## #   anemia <dbl>, cardiac <dbl>, lung <dbl>, diabetes <dbl>, herpes <dbl>, hydra <dbl>,
## #   hemo <dbl>, chyper <dbl>, phyper <dbl>, eclamp <dbl>, renal <dbl>, cigar <dbl>,
## #   cigar6 <dbl>, alcohol <dbl>, drink <dbl>, dfeduc <dbl>, tobuse <dbl>, alcuse <dbl>,
## #   congen <dbl>, smsares <chr>
  1. Keep the following variables: a) datayear (year of birth), b) state (state of residence), c) dmage (mother’s age), d) mrace (mother’s race), e) birth weight, f) child gender, and g) marital status. You should find the corresponding variables using the natality documentation here: .

  2. Build a birthweight-education regression model in an attempt to evaluate the relationship between mother’s education and child’s birthweight. Start from a simple regression that specifies the relationship between mother’s education and child birthweight.

  3. Draw a DAG to illustrate how the DGP might look like.

  4. Add in necessary covariates following the illustration in 3. Explain the addition of the covariates, i.e., why is it important to incorporate these covariates in your model specification.

  5. Test the hypothesis that returns to education on child’s birthweight can vary by black vs. white race.

  6. Include the state of residence in your model specification. Show the results from this specification. What happens to the coefficient on mother’s education?

  7. Is it important to include the state of mother’s residence in the model specification? Why?

  8. Have you estimated the causal effect of mother’s education on children’s birthweight? Explain.