Date: 2026-04-01 Status: Design
Depends on: Robust SEs (HC0–HC3), FE computation
(demeaning + encodeFEColumn)
Nearly every panel/DiD paper clusters standard errors. Without them,
even correct coefficients produce wrong inference. The HC sandwich
machinery (sandwich.ts) is already built — clustering uses
the same sandwich form with a different meat matrix. Multi-way
clustering (Cameron-Gelbach-Miller 2011) is a thin combinator on top of
one-way, so we implement both at once.
Unlocks:
feols(y ~ x | fe, vcov = ~state),
felm(y ~ x | fe | 0 | state + year), and all combinations
of one-way and multi-way clustering for OLS and 2SLS.
core/stats/types.tsExtend VcovType to include clustering:
export type VcovType = 'classical' | HCType | 'cluster';Add cluster metadata type and field on
RegressionResult:
export interface ClusterInfo {
name: string;
nClusters: number;
}
export interface RegressionResult {
// ... existing fields unchanged
vcovType: VcovType;
clusterInfo?: ClusterInfo[]; // one entry per cluster dimension
}vcovType: 'cluster' and clusterInfo always
appear together. clusterInfo is absent when
vcovType !== 'cluster'.
core/pipeline/types.tsAdd clusterVars to LinearModelParams:
export interface LinearModelParams {
formula: Formula;
data: string;
estimator: 'ols' | '2sls';
endogenous?: string[];
instruments?: string[];
fixedEffects?: string[];
vcovType?: HCType; // undefined = classical; mutually exclusive with clusterVars
clusterVars?: string[]; // NEW — cluster variable names
}vcovType (HC type) and clusterVars are
mutually exclusive on LinearModelParams. When
clusterVars is present, the executor ignores
vcovType and computes clustered SEs instead.
core/pipeline/types.ts
— NODE_PORTSNo changes needed. LinearModelNode already has
data input and model output ports.
computeClusteredVcovcore/stats/sandwich.tsexport function computeClusteredVcov(
X: Matrix,
residuals: number[],
XtXinv: Matrix,
clusterDims: { groupIds: number[]; nGroups: number }[],
): MatrixcomputeOneWayClusteredMeatPrivate function. For cluster dimension with G
groups:
scores[g] as
p-length zero vectors, one per group.i in group g:
accumulate scores[g][j] += X[i][j] * e[i] for all
j.M = Σ_g scores[g] ⊗ scores[g]' (outer
product sum).c = G/(G-1) · (n-1)/(n-k) where
k = X[0].length.c · (X'X)⁻¹ · M · (X'X)⁻¹.This matches the default correction in fixest
(dof(adj = TRUE, fixef.K = "none")), Stata’s
vce(cluster), and R’s sandwich::vcovCL.
Complexity: O(n·p + G·p²) — one pass over observations to accumulate scores, one pass over groups for outer products. Dominated by the existing OLS computation for any realistic dataset.
For D cluster dimensions, there are 2^D - 1
non-empty subsets. For each subset S:
S. Encode by concatenating
groupId values into a composite key, then re-encode to
sequential IDs using a Map (same approach as
encodeFEColumn).(-1)^(|S|+1). Single dimensions are added, pairwise
intersections are subtracted, triple intersections are added, etc.Final result: V = Σ_S sign(S) · V_S
For two-way (D=2): V = V₁ + V₂ - V₁₂ (3
one-way computations). For three-way (D=3):
V = V₁ + V₂ + V₃ - V₁₂ - V₁₃ - V₂₃ + V₁₂₃ (7 one-way
computations).
Eigenvalue check: The resulting matrix should be positive semi-definite. In practice, Cameron-Gelbach-Miller can produce non-PSD matrices with few clusters. We don’t correct for this (consistent with fixest default behavior), but if any diagonal element is negative, clamp it to a small positive value and log a warning. This is extremely rare with real data.
computeRobustVcov or computeRobustFThe existing HC functions remain unchanged.
computeRobustF works with any vcov matrix and will be
called with the clustered vcov output.
feols —
recognizeFeolsFromSourceExtend the existing
vcov=/se=/cluster= argument
scanning (lines 338–353) to detect formula-style cluster
specifications.
Current behavior: Regex matches
vcov='hetero', se='HC1', etc. as string
literals.
New behavior: Before checking for string HC types, check for formula-style cluster specs:
vcov = ~state →
clusterVars: ['state']vcov = ~state + year →
clusterVars: ['state', 'year']cluster = ~state →
clusterVars: ['state']cluster = ~state + year →
clusterVars: ['state', 'year']se = 'cluster' → look for a separate
cluster = ~var argumentDetection: Match
^(?:vcov|cluster)\s*=\s*~\s*(.+) on each arg part. If
matched, parse the RHS with parseRHSTerms() and extract
main-effect variable names. This reuses the same formula parsing used
for FE extraction.
When clusterVars is detected, do not
set vcovType — they are mutually exclusive.
Emit: args.clusterVars = ['state'] (or
['state', 'year'] for multi-way).
felm —
recognizeFelmFromSourceReplace // Part 4: cluster — ignored (line 478) with
actual extraction:
felm(y ~ x | fe | 0 | state) → clusterVars: ['state']
felm(y ~ x | fe | 0 | state + year) → clusterVars: ['state', 'year']
When pipeParts.length >= 4 and part 4 is not
'0': 1. Parse with
parseRHSTerms(pipeParts[3].trim()) 2. Extract main-effect
variable names (same pattern as FE extraction) 3. Set
args.clusterVars = varNames
ivregivreg() has no native cluster argument. Clustered SEs
for ivreg come from vcovCL() post-estimation —
that’s a separate recognizer pattern (backlog item).
core/pipeline/mapper.ts
— createNode linear-model caseAdd clusterVars threading, same pattern as
vcovType:
...(call.args['clusterVars'] ? { clusterVars: call.args['clusterVars'] as string[] } : {}),core/pipeline/executor.ts
— realLinearModelThread clusterVars to both
computeRegression and compute2SLS:
// 2SLS path
return compute2SLS(
lmNode.params.formula, inputDataset,
lmNode.params.endogenous, lmNode.params.instruments,
lmNode.params.vcovType, fe,
lmNode.params.clusterVars, // NEW
);
// OLS path
return computeRegression(
lmNode.params.formula, inputDataset,
lmNode.params.vcovType, fe,
lmNode.params.clusterVars, // NEW
);core/stats/regression.ts
— computeRegressionNew optional parameter: clusterVars?: string[].
When clusterVars is present:
clusterVars, get the column from dataset,
filter to validRows, call
encodeFEColumn(values, validRows) →
{ groupIds, nGroups }. Collect as
clusterDims.computeClusteredVcov(X, residuals, XtXinv, clusterDims).computeRobustF(beta, clusteredVcov, hasIntercept) — same as
HC path.{ name: varName, nClusters: dim.nGroups }.vcovType: 'cluster', clusterInfo: [...].The cluster path replaces the HC path — they are mutually exclusive.
If both vcovType and clusterVars are somehow
present, clusterVars wins.
core/stats/regression-2sls.ts
— compute2SLSSame pattern. New optional parameter
clusterVars?: string[].
When present, use XProj (the projected design matrix
from the 2SLS correction) instead of raw X in the clustered
sandwich. This is the same substitution already done for HC robust SEs
on line 340:
// Current (HC):
robustVcov = computeRobustVcov(XProj, residuals, XProjTXProjInv, vcovType);
// Clustered:
clusteredVcov = computeClusteredVcov(XProj, residuals, XProjTXProjInv, clusterDims);Cluster encoding uses the same validRows filtering as
the OLS path.
core/pipeline/param-schema.tsAdd clusterVars to
PARAM_SCHEMAS['linear-model']:
{
key: 'clusterVars',
label: 'Cluster',
kind: 'identifier',
multivaluable: true,
},This displays cluster variable names in the property sheet. Read-only for now (editable params are M6).
No changes to the vcovType select options —
HCType values remain as-is. The “Clustered” display in UI
comes from detecting clusterVars presence on the node
params, not from a vcovType select value. When
clusterVars is set, the property sheet shows
Std. Errors: Clustered (state) by reading both fields.
The results panel already shows vcovType. When
vcovType === 'cluster':
result.clusterInfo[].name.The SE type indicator in comparison tables should display the cluster variable names. When comparing models with different SE types (e.g., classical vs clustered), this creates a natural axis for the spec explorer.
No changes needed. The spec curve already handles
vcovType as a parameter axis. 'cluster'
becomes a new value on that axis.
sandwich.test.tsOne-way cluster test: Small dataset (~20 rows, ~4 clusters). Compute clustered SEs and validate against R:
library(fixest)
d <- data.frame(y = ..., x = ..., cl = ...)
m <- feols(y ~ x, data = d, vcov = ~cl)
summary(m)$se # expected SEsTwo-way cluster test: Same dataset with a second cluster variable. Validate against:
m2 <- feols(y ~ x, data = d, vcov = ~cl1 + cl2)
summary(m2)$seTolerance: < 0.00005 for SEs (same as existing stats tests).
recognizer.test.tsfeols(y ~ x | fe, vcov = ~state) →
clusterVars: ['state']feols(y ~ x | fe, vcov = ~state + year) →
clusterVars: ['state', 'year']feols(y ~ x | fe, cluster = ~state) →
clusterVars: ['state']felm(y ~ x | fe | 0 | state) →
clusterVars: ['state']felm(y ~ x | fe | 0 | state + year) →
clusterVars: ['state', 'year']feols(y ~ x, vcov = 'hetero') →
vcovType: 'HC1', no clusterVars
(unchanged)regression.test.tsfeols(y ~ x | fe, vcov = ~state))regression-2sls.test.tsfeols(y ~ 1 | fe | x ~ z, vcov = ~state))pipeline/integration.test.tsEnd-to-end: R code string → parse → recognize → map → execute → verify clustered SEs match R.
feols(y ~ x | state, vcov = ~state, data = d)The small-sample correction for clustered SEs matches fixest defaults:
c = G/(G-1) · (n-1)/(n-k)
Where: - G = number of clusters in the dimension -
n = number of observations - k = number of
estimated parameters (columns of X, after FE absorption if
applicable)
For multi-way, each one-way computation in the inclusion-exclusion
uses the correction for its own cluster count. The intersection
cluster’s G is the number of unique combinations.
This matches fixest::feols with default
dof() settings and is the most common convention. Stata
uses G/(G-1) only (no (n-1)/(n-k) factor), but
the fixest convention is standard in modern applied econ.
coeftest(mod, vcov = vcovCL(mod, ~state))
post-estimation pattern — different architecture (node modification),
backlog itemivreg() cluster support — needs vcovCL
wrapper patternconley() spatial clustering — exotic, no current
demand