Causal Inference
2026-01-27
1 Preface
Simply put, causal inference is about answering the question: * Does X cause Y*? It’s one of the fundamental approaches in modern-day science. Like everything else, the field is ever-evolving and spans across multiple disciplines.
For example, say you are interested in evaluating the efficacy of a new vaccine. Or, you are trying to analyze the long-run impacts of school shooting on mental health. Or might be intrigued by how an intervention by U.S. Immigration and Customs Enforcement in Minnesotta affects mental health of Minnesotians. In all these instances, causal inference has a role to play.
The gold standard of causal inference is an experimental setting where treatment assignment is randomly allocated. However, experimental designs are not always available and are restricted due to practical constraints such as cost, time, and effort. For instance, it is highly unlikely that you will be using an experimental design for your thesis. In such cases researchers need to rely on non-experimental identification techniques as an alternative.
In this write-up, I’ll discuss techniques used to understand causal relationship in non-experimental settings using the potential outcome framework. The general idea is as such: if person “A” is treated, the ideal comparison to person “A” would be person “A” herself, but during the untreated state. However, it is impossible to observe person “A” in both treated and untreated states simultaneously. Hence, we need to find suitable comparisons to person “A” to be able to evaluate the effect of the treatment. In summary, the idea is to form counterfactual outcomes which is used in comparison to the treated unit to estimate the effect of a treatment.
This impossibility of observing both potential outcomes for the same individual is often referred to as the fundamental problem of causal inference. Because we can never directly observe the counterfactual outcome, estimating causal effects necessarily requires assumptions. These assumptions allow us to argue that certain observed units can serve as reasonable stand-ins for the missing counterfactuals. Much of causal inference, therefore, is not about finding perfect comparisons, but about carefully justifying why particular comparisons are credible.
Causal inference has also benefited massively from modern ML methods, which have opened doors for many possibilities. Where appropriate, I will attempt to combine ML techniques to aid causal inference throughout this write-up.
Overall, I’ve tried emphasizing intuition rather than mathematical rigor. I hope that the write-up provides you with solid intuition regarding the topics discussed. However, note that it is by no means comprehensive. This is a work in progress and I intend to update it every semester bit by bit. So far, I’ve written most of the codes in R, while some parts use Python. In the future, I hope to incorporate code examples in both languages.