Motivation
feols() with | fe1 + fe2 is already parsed, mapped, and displayed in the DAG — but execution falls back to pooled OLS with a warning. This is the #1 execution gap: coefficients are wrong, SEs are wrong, inference is wrong. Fixing this turns existing parser coverage into real, trustworthy value.
Approach
Algorithm: Alternating Projections
Cycle through FE dimensions, subtracting group means for each dimension, repeating until convergence. For 1-way FE this is a single exact pass. For 2-way+, iterate until the maximum absolute change across all columns is below tolerance (typically 5–20 iterations for applied econ panels).
Where Demeaning Happens
Matrix
FE Groups
Intercept
Demeaned
+ FE Info
buildDesignMatrix returns X, y as today. A new demean() function transforms these arrays. Then OLS runs on the demeaned data. This means demeaning correctly handles categorical expansions, interactions, and missing-value filtering — all of which happen during design matrix construction.
Upgrade Path
The demean() interface is swappable. Acceleration tricks (Bergé's method from fixest, Gaure's method from lfe) can replace the internals later without changing any callers. For target paper sizes (<50K rows, <100 FE levels), vanilla alternating projections is fast enough.
1. New Module: demean.ts
Pure demeaning function with no dependencies on Dataset, Formula, or pipeline types.
TypeScriptexport interface FEDimension {
name: string;
groupIds: number[]; // integer group ID per observation, length n
nGroups: number;
}
export interface DemeanResult {
columns: number[][]; // demeaned columns (same length as input)
feInfo: { name: string; nGroups: number }[];
absorbedDf: number; // exact, via union-find
iterations: number; // actual iterations used (1 for 1-way)
}
export function demean(
columns: number[][],
feDimensions: FEDimension[],
tolerance?: number, // default 1e-8
maxIterations?: number, // default 1000
): DemeanResult;
Algorithm Detail
For each iteration:
- For each FE dimension d: for each column c, compute group means by d's group IDs and subtract.
- Compute max absolute change across all columns since last iteration.
- If change < tolerance, stop.
For 1-way FE, step 1 runs once — exact in one pass. Group mean computation: accumulate sum and count in a single O(n) pass per column per dimension.
2. Union-Find for Exact Absorbed df
Also in demean.ts. Computes the exact rank of the combined FE dummy matrix for any number of FE dimensions.
TypeScriptexport function computeAbsorbedDf(
feDimensions: FEDimension[],
n: number,
): { absorbedDf: number; nComponents: number };
Algorithm
- Create a node for each unique (dimension, group) pair. Tag each with its FE dimension index.
- For each observation, union together all its FE nodes across dimensions.
- For each connected component, count the distinct FE dimensions it spans (dc).
Examples
absorbed = 5 − 0 = 5
absorbed = 20 − 1 = 19
absorbed = 12 − 2 = 10
absorbed = 20 − 2 = 18
3. Modify: regression.ts
computeRegression() gains an optional fixedEffects parameter:
TypeScriptexport function computeRegression(
formula: Formula,
dataset: Dataset,
vcovType?: HCType,
fixedEffects?: string[], // NEW — FE column names
): RegressionResult;
When fixedEffects is provided and non-empty:
buildDesignMatrix(formula, dataset)→ X, y, columnNames, validRows (unchanged)- Extract FE columns from dataset, filter to validRows, encode as integer group IDs
- Build
FEDimension[]from the extracted columns - Call
demean([...X_columns, y], feDimensions)→ demeaned columns - Reconstruct X from demeaned columns, dropping the intercept column
- OLS on demeaned data with corrected df: dfResidual = n − kslopes − absorbedDf
- R² from demeaned SST/SSR (within R²); F-statistic with corrected df
- If
vcovTypeset, robust SEs use demeaned X and corrected dfResidual - Attach
fixedEffects: FEInfo[]to result
hasIntercept is effectively false for the OLS step, even if the formula says hasIntercept: true. Demeaning zeroes out the intercept column, and we drop it.
4. Modify: regression-2sls.ts
compute2SLS() gains the same optional fixedEffects parameter. When present:
- Build X (exogenous + endogenous) and Z (exogenous + instruments) as today
- Extract FE group IDs, demean X, Z, and y
- Drop intercept columns from both X and Z
- Run stages 1 and 2 on demeaned data with adjusted dfResidual
- First-stage F, Wu-Hausman, and Sargan use demeaned matrices and corrected df
5. Modify: types.ts
TypeScriptexport interface FEInfo {
name: string;
nGroups: number;
}
export interface RegressionResult {
// ... all existing fields unchanged ...
fixedEffects?: FEInfo[]; // NEW
}
Purely additive — no existing fields change meaning. FEInfo is the canonical definition, imported by demean.ts.
6. Modify: executor.ts
Pass lmNode.params.fixedEffects through to computeRegression and compute2SLS. One extra argument each.
7. Modify: param-schema.ts
Remove the disabled flag from the fixedEffects ParamDef:
8. UI Changes
results-panel.tsx: Remove the yellow “FE not yet computed” warning. Show FE group counts:
year FE — 10 groups
spec-comparison-view.tsx: Already shows FE checkmarks — no changes needed.
What Doesn't Change
| Component | Status |
|---|---|
| Parser / Recognizer | FE already extracted from feols(), felm(), fepois(), feglm() |
| Mapper | FE already threaded from AnalysisCall.args → LinearModelNode.params |
| Pipeline types | LinearModelParams.fixedEffects already defined |
| Group detector | FE already in partition keys |
| Comparison table | Consumes coefficients unchanged; FE checkmarks already rendered |
| Spec curve | No changes |
| Export (LaTeX/CSV) | No changes |
Edge Cases
Testing
R Validation Vectors
Rlibrary(fixest)
# Test 1: 1-way FE
set.seed(42)
d <- data.frame(
y = rnorm(30), x1 = rnorm(30), x2 = rnorm(30),
state = rep(c("A","B","C","D","E"), each=6)
)
m1 <- feols(y ~ x1 + x2 | state, data=d)
# Test 2: 2-way FE, balanced
set.seed(123)
d2 <- data.frame(
y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100),
state = rep(letters[1:10], 10),
year = rep(2010:2019, each=10)
)
m2 <- feols(y ~ x1 + x2 | state + year, data=d2)
# Test 3: unbalanced (connected components)
d3 <- d2[sample(nrow(d2), 70), ]
m3 <- feols(y ~ x1 + x2 | state + year, data=d3)
# Test 4: FE + robust SEs
m4 <- feols(y ~ x1 + x2 | state, data=d, vcov='hetero')
# Test 5: FE + 2SLS
set.seed(99)
z <- rnorm(200)
x_endog <- 0.5*z + rnorm(200, sd=0.5)
fe_group <- rep(1:20, each=10)
fe_effect <- rep(rnorm(20, sd=2), each=10)
y5 <- 1 + 2*x_endog + fe_effect + rnorm(200)
d5 <- data.frame(y=y5, x=x_endog, z=z,
g=as.factor(fe_group))
m5 <- feols(y ~ 1 | g | x ~ z, data=d5)
Test Cases
- 1 1-way FE coefficients — match feols within tolerance (<0.00005)
- 2 1-way FE SEs and df — dfResidual = n − kslopes − nGroups
- 3 2-way FE balanced — coefficients, within R², df match feols
- 4 2-way FE unbalanced — connected components gives correct df
- 5 FE + robust SEs — HC1 with FE-corrected df matches feols(vcov='hetero')
- 6 FE + 2SLS — demeaned IV matches feols with IV + FE
- 7 Demean convergence — 2-way converges within tolerance; iterations < 100
- 8 Intercept absorbed — no (Intercept) in coefficient names
- 9 Single FE level — handled gracefully (skip or warn)
- 10 Union-find unit tests — balanced, disconnected, 3-way → correct df
- 11 Integration test — feols(y ~ x | state + year) end-to-end parse → execute
- 12 Collinear regressor — variable constant within FE groups → clear error
Deferred
- Acceleration tricks (Bergé/Gaure methods) — same interface, swap internals later
etable()FE row display- FE coefficient recovery (estimating the fixed effects themselves)
- Singleton detection and dropping
File Summary
| Action | File | Responsibility |
|---|---|---|
| NEW | stats/demean.ts | demean() + computeAbsorbedDf() |
| MOD | stats/regression.ts | Accept fixedEffects, orchestrate demean → OLS |
| MOD | stats/regression-2sls.ts | Accept fixedEffects, orchestrate demean → 2SLS |
| MOD | stats/types.ts | Add FEInfo, fixedEffects on result |
| MOD | pipeline/executor.ts | Pass fixedEffects through |
| MOD | pipeline/param-schema.ts | Remove disabled flag |
| MOD | ui/.../results-panel.tsx | Remove warning, show FE info |