Date: 2026-03-26 Milestone: 3 — Core Econometrics Scope: Heteroskedasticity-consistent (sandwich) standard errors for OLS and 2SLS
8/10 benchmark applied econ papers use robust or clustered standard errors. Without them, even correct coefficients produce wrong inference. Robust SEs are also the foundation for clustered SEs (same sandwich machinery, different meat matrix) — building this right now makes clustered SEs a small extension later.
src/core/stats/sandwich.tsPure function computing the heteroskedasticity-consistent variance-covariance matrix.
type HCType = 'HC0' | 'HC1' | 'HC2' | 'HC3';
function computeRobustVcov(
X: number[][],
residuals: number[],
XtXinv: number[][],
type: HCType
): number[][];
Math:
All HC types use the sandwich form: V = (X'X)⁻¹ · M · (X'X)⁻¹
The meat matrix M differs by type:
M = X' diag(eᵢ²) X — no finite-sample correctionM = (n/(n-k)) · X' diag(eᵢ²) X — degrees-of-freedom correction (fixest default for vcov='hetero')M = X' diag(eᵢ²/(1-hᵢ)) X — leverage-adjustedM = X' diag(eᵢ²/(1-hᵢ)²) X — jackknife-like, most conservative (R sandwich::vcovHC default)Where eᵢ are OLS residuals and hᵢ = Xᵢ (X'X)⁻¹ Xᵢ' are hat matrix diagonal elements (leverage values). HC0/HC1 don't need leverage, so skip hat computation for those types.
Inputs: The function reuses the already-computed (X'X)⁻¹ from the OLS pass — no extra QR decomposition. The design matrix X and residuals are already available inside computeRegression.
Returns: Full k×k variance-covariance matrix (not just diagonal), since downstream uses (clustered SEs, Wald tests) need the full matrix.
src/core/stats/regression.tscomputeRegression() gains an optional vcovType parameter:
function computeRegression(
formula: Formula,
dataset: Dataset,
vcovType?: HCType
): RegressionResult;
When vcovType is specified:
(X'X)⁻¹ as usual (unchanged)computeRobustVcov(X, residuals, XtXinv, vcovType) to get robust vcov√(diag(robustVcov))result.vcovType on the resultClassical model statistics (R², adj-R², residual SE) are unchanged — they don't depend on the vcov.
The F-statistic should use the robust vcov when available (Wald F-test: β' V⁻¹ β / k), but this is a refinement — initial implementation can keep the classical F and note it in result metadata.
src/core/stats/regression-2sls.tscompute2SLS() gains the same optional vcovType parameter. The sandwich form applies the same way, using 2SLS residuals and the projection-corrected X. Same meat matrix formulas, different residuals.
src/core/stats/types.tsAdd vcovType to RegressionResult:
export type VcovType = 'classical' | 'HC0' | 'HC1' | 'HC2' | 'HC3';
export interface RegressionResult {
type: 'regression';
coefficients: CoefficientRow[];
rSquared: number;
adjustedRSquared: number;
fStatistic: number;
fPValue: number;
dfModel: number;
dfResidual: number;
residualStandardError: number;
residuals: number[];
fittedValues: number[];
ivDiagnostics?: IVDiagnostics;
vcovType: VcovType; // NEW — always set, defaults to 'classical'
}
Export VcovType and the HCType subset (excludes 'classical') for use in params.
src/core/pipeline/types.tsAdd vcovType to LinearModelParams:
export interface LinearModelParams {
formula: Formula;
data: string;
estimator: 'ols' | '2sls';
endogenous?: string[];
instruments?: string[];
fixedEffects?: string[];
vcovType?: 'HC0' | 'HC1' | 'HC2' | 'HC3'; // NEW — undefined = classical
}
src/core/pipeline/param-schema.tsAdd vcovType ParamDef to the linear-model schema:
{
key: 'vcovType',
label: 'Std. Errors',
kind: 'select',
multivaluable: true, // can vary across specifications
defaultValue: 'classical',
options: [
{ value: 'classical', label: 'Classical (iid)' },
{ value: 'HC0', label: 'Robust (HC0)' },
{ value: 'HC1', label: 'Robust (HC1)' },
{ value: 'HC2', label: 'Robust (HC2)' },
{ value: 'HC3', label: 'Robust (HC3)' },
],
}
This makes vcovType editable in the property sheet and variable across specifications in the spec explorer. Comparing classical vs robust SEs for the same model is a natural spec curve use case.
src/core/pipeline/executor.tsPass vcovType from LinearModelParams through to computeRegression() / compute2SLS(). One extra argument.
src/core/parsers/r/recognizer.tsRecognize vcovType from inline arguments on feols() and felm():
feols patterns:
feols(y ~ x, data=d, vcov='hetero') → HC1feols(y ~ x, data=d, se='hetero') → HC1 (older alias)feols(y ~ x, data=d, vcov='HC0') → HC0 (similarly HC1–HC3)felm pattern:
felm(y ~ x | 0 | 0 | cluster, data=d) — the 4th pipe-separated part is cluster variable. This is clustered SEs (future work), not HC — skip for now.lm pattern:
lm() itself has no robust SE argument. Robust SEs for lm() come from coeftest(mod, vcov=vcovHC) — deferred to a later session (roadmap "Important", not "Required"; architecturally different as post-estimation node modification).Implementation: in recognizeFeolsFromSource(), extract the vcov or se argument value after parsing the formula parts. Map string values to HCType.
vcovType select dropdown appears for linear-model nodes (automatic from ParamDef)CoefficientRow. The comparison table's standardErrors array gets robust values when vcovType is set.RegressionResult shape (coefficients still have estimate, se, t, p)library(sandwich); library(lmtest)
# Generate test data
set.seed(42)
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n)
e <- rnorm(n) * (1 + abs(x1)) # heteroskedastic errors
y <- 2 + 3*x1 - 1.5*x2 + e
d <- data.frame(y, x1, x2)
mod <- lm(y ~ x1 + x2, data=d)
# Extract for each HC type
for (hc in c("HC0", "HC1", "HC2", "HC3")) {
ct <- coeftest(mod, vcov=vcovHC(mod, type=hc))
cat(hc, ":\n")
print(ct)
}
ivreg with vcov.=vcovHCfeols(y ~ x, data=d, vcov='hetero') parses → maps → executes with HC1 SEscoeftest(mod, vcov=vcovHC) post-estimation pattern (Important, not Required — different architecture)etable() recognition with SE type display