4.9 An exercise
setwd("/Users/vshrestha/Dropbox/Machine Learning/book/data")
# arrow to open large data sets
library(arrow)
# birthweight data for 2000
<- read_feather( "NCHS_birthweight2000.feather")
bw_data2000 head(bw_data2000)
## # A tibble: 6 × 47
## datayear pldel birattnd statenat cntynat stoccfip cntocfip stateres cntyres cityres
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 2000 1 1 33 33028 36 36059 33 33028 999
## 2 2000 1 1 47 47040 51 51059 47 47103 999
## 3 2000 1 1 36 36043 39 39085 36 36999 999
## 4 2000 1 2 23 23033 26 26065 23 23033 999
## 5 2000 1 1 05 05001 06 06001 05 05001 103
## 6 2000 1 1 10 10050 12 12099 10 10050 999
## # ℹ 37 more variables: dmage <dbl>, mrace <dbl>, dmeduc <dbl>, dmar <dbl>, mplbir <dbl>,
## # nlbnd <dbl>, totord9 <dbl>, monprec <dbl>, nprevis <dbl>, dfage <dbl>, frace <dbl>,
## # birmon <dbl>, dgestat <dbl>, gestat10 <dbl>, csex <dbl>, dbirwt <dbl>, fmaps <dbl>,
## # anemia <dbl>, cardiac <dbl>, lung <dbl>, diabetes <dbl>, herpes <dbl>, hydra <dbl>,
## # hemo <dbl>, chyper <dbl>, phyper <dbl>, eclamp <dbl>, renal <dbl>, cigar <dbl>,
## # cigar6 <dbl>, alcohol <dbl>, drink <dbl>, dfeduc <dbl>, tobuse <dbl>, alcuse <dbl>,
## # congen <dbl>, smsares <chr>
Keep the following variables: a) datayear (year of birth), b) state (state of residence), c) dmage (mother’s age), d) mrace (mother’s race), e) birth weight, f) child gender, and g) marital status. You should find the corresponding variables using the natality documentation here: .
Build a birthweight-education regression model in an attempt to evaluate the relationship between mother’s education and child’s birthweight. Start from a simple regression that specifies the relationship between mother’s education and child birthweight.
Draw a DAG to illustrate how the DGP might look like.
Add in necessary covariates following the illustration in 3. Explain the addition of the covariates, i.e., why is it important to incorporate these covariates in your model specification.
Test the hypothesis that returns to education on child’s birthweight can vary by black vs. white race.
Include the state of residence in your model specification. Show the results from this specification. What happens to the coefficient on mother’s education?
Is it important to include the state of mother’s residence in the model specification? Why?
Have you estimated the causal effect of mother’s education on children’s birthweight? Explain.