Add a did pipeline node that runs
difference-in-differences estimators as a single coherent analysis.
Internally, each estimator composes over existing regression primitives
(FE demeaning, OLS, clustered SEs). Externally, it’s one node with typed
params and a unified result containing event-study coefficients, ATT
summaries, and pre-trends tests.
User story: Researcher pastes R code containing
att_gt(), did2s(), or feols()
with event-study syntax → Interlyse recognizes it as a DiD analysis →
executes the estimator(s) → displays an event-study plot overlaying all
estimators with confidence intervals.
Why now (between M4 and M5): DiD is the dominant method in applied econ. 3 of 6 reference papers use it. M5’s replication workflow needs DiD to validate against the JEL-DiD paper. The engine (M3 regression + M4 data pipelines) is ready; this milestone adds the meta-estimator layer on top.
| Estimator | R Package | Internal approach |
|---|---|---|
| TWFE event-study | fixest (i() syntax) |
Expand i(event_time, ref=...) into binary indicators,
run existing feols() path |
| Gardner two-stage | did2s | Stage 1: OLS on untreated obs → residuals. Stage 2: OLS on residualized outcome with event indicators. Two calls to existing regression.ts |
| Callaway-Sant’Anna | did (att_gt) |
Loop over (cohort, time) pairs, run 2×2 DiD on subsets via OLS, aggregate with proper weights. Bootstrap for inference (default 1000 iterations) |
| Sun-Abraham | fixest (sunab()) |
Construct cohort × relative-time interaction indicators, run feols, reweight coefficients for aggregation |
| Borusyak imputation | didimputation | OLS on untreated observations → impute Y(0) for treated → ATT = Y - Y_hat(0). Influence-function SEs |
Deferred (Tier 3): Roth-Sant’Anna
(staggered) — specialized variance formula, lower usage.
Slots in later as another run* function.
One did executor with per-estimator error isolation via
try/catch. Failed estimators produce error entries in the result;
succeeded estimators display normally.
did-executor
│
├─ validatePanel(dataset, params) → PanelInfo { units, times, cohorts, neverTreated }
├─ preparePanelData(dataset, params) → adds time_to_treat, ever_treated, treatment columns
│
├─ try { runTWFE() } → success | error
├─ try { runGardner() } → success | error
├─ try { runCS() } → success | error
├─ try { runSunAbraham() } → success | error
├─ try { runBorusyak() } → success | error
│
└─ DiDResult
├─ succeeded: EstimatorResult[]
├─ failed: { estimator, error }[]
├─ eventStudyCoefs: Map<estimator, { time, coef, se, ciLower, ciUpper }[]>
├─ attSummary: Map<estimator, { att, se, pValue }>
└─ preTrendsTest: Map<estimator, { fStat, pValue, df }>
runStaggered() function + one try/catch block.regression.ts OLS path — TWFE, Gardner stages, Borusyak
imputationdemean.ts FE demeaning — TWFE and Sun-Abraham with
unit+time FEsandwich.ts clustered SEs — all estimators need
clustering on unitexpression.ts data manipulation — constructing
time_to_treat, treatment indicatorsDataset columnar storage — subsetting for (g,t) pair
regressions in C-Si(time_to_treat, ref=c(-1, -Inf)) → binary columns for each
relative time period, dropping reference periods. Reusable for TWFE and
Sun-Abraham.gname column (unique non-zero/non-NA values of
first-treatment-period variable).interface DiDNode extends BaseNode {
type: 'did';
params: DiDParams;
result?: DiDResult;
}
interface DiDParams {
yname: string; // outcome variable
tname: string; // calendar time variable
idname: string; // unit/panel ID variable
gname: string; // cohort variable (first treatment period; 0/Infinity = never-treated)
estimators: DiDEstimator[]; // which estimators to run
xformla?: Formula; // optional covariates (RHS of ~)
weights?: string; // optional weight column
clusterVar?: string; // cluster variable (defaults to idname)
controlGroup: 'nevertreated' | 'notyettreated'; // who is the comparison group
eventHorizon?: [number, number]; // relative time range [min_e, max_e] (default: data-driven)
bootstrapIterations?: number; // for C-S bootstrap (default: 1000)
confidenceLevel?: number; // for CIs (default: 0.95)
}
type DiDEstimator = 'twfe' | 'gardner' | 'callaway-santanna' | 'sun-abraham' | 'borusyak';interface DiDResult {
type: 'did';
panelInfo: PanelInfo;
estimatorResults: EstimatorOutcome[];
eventStudyCoefs: Record<DiDEstimator, EventStudyCoefficient[]>;
attSummary: Record<DiDEstimator, ATTSummary>;
preTrendsTest: Record<DiDEstimator, PreTrendsTest>;
}
type EstimatorOutcome =
| { estimator: DiDEstimator; status: 'success' }
| { estimator: DiDEstimator; status: 'error'; error: string };
interface PanelInfo {
nUnits: number;
nPeriods: number;
cohorts: { treatmentTime: number; nUnits: number }[];
nNeverTreated: number;
balancedPanel: boolean;
}
interface EventStudyCoefficient {
relativeTime: number; // e.g., -3, -2, -1, 0, 1, 2, 3
estimate: number;
se: number;
ciLower: number;
ciUpper: number;
pValue: number;
isReference: boolean; // true for omitted reference period
}
interface ATTSummary {
att: number; // overall average treatment effect on the treated
se: number;
pValue: number;
ciLower: number;
ciUpper: number;
nTreatedObs: number;
}
interface PreTrendsTest {
fStat: number;
pValue: number;
df: [number, number];
preCoefs: EventStudyCoefficient[]; // just the t < 0 coefficients
}'did': {
inputs: [{ name: 'dataset', type: 'dataset', label: 'Panel Data' }],
outputs: [{ name: 'result', type: 'did-result', label: 'DiD Result' }],
}did node# Callaway-Sant'Anna
att_gt(yname = "y", tname = "year", idname = "id", gname = "first_treat",
data = df, control_group = "notyettreated")
# → did node with estimators: ['callaway-santanna']
# Gardner two-stage
did2s(data, yname = "y", first_stage = ~0 | id + year,
second_stage = ~i(time_to_treat), treatment = "treat", cluster_var = "id")
# → did node with estimators: ['gardner']
# Borusyak imputation
did_imputation(data, yname = "y", gname = "g", tname = "t", idname = "i")
# → did node with estimators: ['borusyak']event_study()
wrapper → single did node with multiple estimatorsevent_study(data = df, yname = "y", idname = "id", tname = "year",
gname = "first_treat", estimator = "all")
# → did node with estimators: ['twfe', 'gardner', 'callaway-santanna', 'sun-abraham', 'borusyak']feols()
with event-study syntax → routed to did instead of
linear-model# i() event-study syntax → did node (estimators: ['twfe'])
feols(y ~ i(time_to_treat, ref = c(-1, -Inf)) | id + year, data = df)
# sunab() syntax → did node (estimators: ['sun-abraham'])
feols(y ~ sunab(first_treat, time_to_treat) | id + year, data = df)The recognizer inspects the formula: if it contains
i(event_var, ref=...) or sunab(...), route to
did node. Plain feols(y ~ treat | id + year)
without event-study syntax stays as linear-model — that’s
standard TWFE, not necessarily a DiD event study.
did nodeWhen multiple DiD calls in the same script share the same panel
structure (same yname, tname,
idname, gname — matched by string equality on
the column name arguments), the mapper merges them into a single
did node with the union of their estimators. Similar to how
multiple feols() calls get grouped in the spec explorer
today. If panel structures differ (different outcome variable, different
unit ID), they remain separate did nodes.
# These three calls share panel structure → one did node with 3 estimators
cs_result <- att_gt(yname="y", tname="year", idname="id", gname="g", data=df)
gardner_result <- did2s(df, yname="y", ...)
feols(y ~ i(time_to_treat, ref=-1) | id + year, data = df)
# → did node with estimators: ['callaway-santanna', 'gardner', 'twfe']aggte()
→ modifies the parent did node’s aggregation paramsresult <- att_gt(yname="y", tname="year", idname="id", gname="g", data=df)
aggte(result, type = "dynamic", min_e = -5, max_e = 10)
# → sets eventHorizon: [-5, 10] on the did node; aggte is not a separate nodeObservable Plot chart showing: - X-axis: relative time (periods before/after treatment) - Y-axis: estimated coefficient (treatment effect) - One series per estimator, color-coded (reuse COLORS from plot-theme.ts) - 95% CI bands (shaded or error bars) - Vertical dashed line at t = 0 (treatment onset) - Horizontal dashed line at y = 0 (null effect) - Reference period marked (typically t = -1)
| Estimator | ATT | SE | 95% CI | p-value |
|---|---|---|---|---|
| TWFE | 0.045 | 0.012 | [0.021, 0.069] | 0.000 |
| Gardner | 0.042 | 0.011 | [0.020, 0.064] | 0.000 |
| C-S | 0.038 | 0.013 | [0.013, 0.063] | 0.003 |
| … |
| Estimator | Joint F | p-value | Pre-trend? |
|---|---|---|---|
| TWFE | 1.23 | 0.294 | No |
| Gardner | 0.98 | 0.421 | No |
| … |
Displayed in the property sheet or results header: - N units, N periods, balanced/unbalanced - Treatment cohorts with counts - Never-treated count - Control group type
Expand i(time_to_treat, ref=c(-1, -Inf)) into binary
indicator columns (one per relative time period, omitting reference
periods). Construct design matrix, run existing feols()
regression path with unit + time FE. Clustered SEs on
idname. Straightforward — mostly indicator generation +
existing infrastructure.
time_to_treat < 0
or ever_treated == FALSE)y ~ 0 | unit + time on untreated subset → get
unit/time FE estimatesy_resid = y - unit_FE - time_FEy_resid ~ event_indicators on full
datasetidname)Two calls to existing regression.ts. The SE adjustment
is the main new piece — standard two-stage correction analogous to 2SLS
SE adjustment we already have.
For each (cohort g, time period t) where t >= g: 1.
Subset data to: units in cohort g + control units (never-treated or
not-yet-treated at t) 2. Take two time periods: t and a base period
(typically g - 1 for universal base) 3. Compute ATT(g,t)
via outcome regression (difference-in-means or with covariates) 4. Store
influence function values for inference
Aggregate ATT(g,t) → overall ATT (simple weighted average) and event-study (average across cohorts at each relative time).
Bootstrap inference: resample unit-level blocks (all periods for a unit stay together), re-compute all ATT(g,t) + aggregation, derive SEs and CIs from bootstrap distribution. Default 1000 iterations.
Performance note: For typical applied econ panels (500-5000 units, 10-30 periods, 3-8 cohorts), each (g,t) regression is small. The bootstrap is the bottleneck — 1000 iterations × ~50 (g,t) pairs × small OLS = ~50K small regressions. At ~0.1ms each on modern hardware, this is ~5 seconds. Acceptable for a one-shot analysis.
Construct interaction indicators: for each cohort g and relative time
e, create I(cohort == g) * I(time_to_treat == e). Run feols
with these interactions + unit/time FE. Then reweight: the TWFE
coefficient on I(time_to_treat == e) is a weighted average
of cohort-specific effects; Sun-Abraham recovers the properly-weighted
average by summing cohort-specific coefficients weighted by cohort
shares.
Uses existing feols path for the regression. New logic: indicator construction + coefficient reweighting.
y ~ unit_FE + time_FE (+ covariates) on
untreated subsettau_hat(i,t) = Y(i,t) - Y_hat(0)(i,t)One regression + prediction + aggregation. The influence-function SE derivation is the complex part.
Run each estimator in R on a known dataset, capture coefficients/SEs/p-values, assert Interlyse matches within tolerance. Use the same tolerance standards as existing regression tests: <0.00005 for statistics, <0.00001 for p-values.
Test datasets: - Simulated staggered adoption panel — small (100 units × 10 periods × 3 cohorts) with known DGP so we can verify both point estimates and SEs - Castle doctrine dataset (from did2s package) — real data with published R output to validate against
did
nodefeols() with i() routes to
did, plain feols() stays as
linear-modelatt_gt() + aggte() with CDC/Medicaid panel
dataevent_study() wrapper:
multi-estimator horse race on Castle doctrine datasrc/core/stats/
did/
panel.ts — PanelInfo, validatePanel(), preparePanelData()
indicators.ts — event-study indicator expansion, i() and sunab() column generation
twfe.ts — runTWFE()
gardner.ts — runGardner()
callaway.ts — runCS(), bootstrap logic
sun-abraham.ts — runSunAbraham()
borusyak.ts — runBorusyak()
aggregate.ts — ATT aggregation, pre-trends test
bootstrap.ts — panel bootstrap (unit-block resampling)
types.ts — DiDParams, DiDResult, EventStudyCoefficient, etc.
index.ts — did executor entry point (orchestrates all of the above)
*.test.ts — colocated tests per file
src/core/pipeline/
types.ts — add DiDNode to PipelineNode union
executor.ts — register 'did' executor
src/core/parsers/r/
recognizer.ts — add att_gt, did2s, event_study, did_imputation patterns;
route feols+i()/sunab() to did
src/ui/components/
results/
event-study-plot.tsx — Observable Plot event-study chart
did-results.tsx — ATT table + pre-trends table + panel info