Design Specification

Fixed Effects Computation
Within-Transformation

1-way and N-way FE via Frisch-Waugh-Lovell demeaning with alternating projections. Exact degrees of freedom for any number of FE dimensions via union-find with per-component dimension counting.

Motivation

feols() with | fe1 + fe2 is already parsed, mapped, and displayed in the DAG — but execution falls back to pooled OLS with a warning. This is the #1 execution gap: coefficients are wrong, SEs are wrong, inference is wrong. Fixing this turns existing parser coverage into real, trustworthy value.

Approach

Frisch-Waugh-Lovell Theorem Demeaning Y and X by FE group means, then running OLS on the demeaned data, gives identical slope coefficients to the LSDV (dummy variable) approach. The intercept is absorbed.

Algorithm: Alternating Projections

Cycle through FE dimensions, subtracting group means for each dimension, repeating until convergence. For 1-way FE this is a single exact pass. For 2-way+, iterate until the maximum absolute change across all columns is below tolerance (typically 5–20 iterations for applied econ panels).

Where Demeaning Happens

Build Design
Matrix
X, y, validRows
Demean by
FE Groups
alternating proj.
Drop
Intercept
absorbed by FE
OLS on
Demeaned
corrected df
Within R²
+ FE Info
RegressionResult

buildDesignMatrix returns X, y as today. A new demean() function transforms these arrays. Then OLS runs on the demeaned data. This means demeaning correctly handles categorical expansions, interactions, and missing-value filtering — all of which happen during design matrix construction.

Upgrade Path

The demean() interface is swappable. Acceleration tricks (Bergé's method from fixest, Gaure's method from lfe) can replace the internals later without changing any callers. For target paper sizes (<50K rows, <100 FE levels), vanilla alternating projections is fast enough.

1. New Module: demean.ts

Pure demeaning function with no dependencies on Dataset, Formula, or pipeline types.

TypeScriptexport interface FEDimension {
  name: string;
  groupIds: number[];  // integer group ID per observation, length n
  nGroups: number;
}

export interface DemeanResult {
  columns: number[][];   // demeaned columns (same length as input)
  feInfo: { name: string; nGroups: number }[];
  absorbedDf: number;    // exact, via union-find
  iterations: number;    // actual iterations used (1 for 1-way)
}

export function demean(
  columns: number[][],
  feDimensions: FEDimension[],
  tolerance?: number,         // default 1e-8
  maxIterations?: number,     // default 1000
): DemeanResult;

Algorithm Detail

For each iteration:

  1. For each FE dimension d: for each column c, compute group means by d's group IDs and subtract.
  2. Compute max absolute change across all columns since last iteration.
  3. If change < tolerance, stop.

For 1-way FE, step 1 runs once — exact in one pass. Group mean computation: accumulate sum and count in a single O(n) pass per column per dimension.

2. Union-Find for Exact Absorbed df

Also in demean.ts. Computes the exact rank of the combined FE dummy matrix for any number of FE dimensions.

TypeScriptexport function computeAbsorbedDf(
  feDimensions: FEDimension[],
  n: number,
): { absorbedDf: number; nComponents: number };

Algorithm

  1. Create a node for each unique (dimension, group) pair. Tag each with its FE dimension index.
  2. For each observation, union together all its FE nodes across dimensions.
  3. For each connected component, count the distinct FE dimensions it spans (dc).
absorbedDf = ∑(nGroupsi) c(dc − 1)
Why Exact The null space of the combined FE dummy matrix [D1 | D2 | … | DD] has a clean structure: within each connected component c spanning dc FE dimensions, there are exactly dc − 1 independent null vectors (corresponding to “constant shift” degrees of freedom that cancel across dimensions). This generalizes the Abowd-Creecy-Kramarz (2002) result for 2-way FE to arbitrary D.

Examples

1-way · 5 states
A B C D E
5 components, each d=1
absorbed = 5 − 0 = 5
2-way balanced · 10×10
s1 s2 s10 y1 y2 y10
1 component, d=2
absorbed = 20 − 1 = 19
3-way balanced · 5×4×3
s1 s5 y1 y4 i1 i2 i3
1 component, d=3
absorbed = 12 − 2 = 10
2-way disconnected · C=2
s1–5 y1–5 s6–10 y6–10
2 components, each d=2
absorbed = 20 − 2 = 18

3. Modify: regression.ts

computeRegression() gains an optional fixedEffects parameter:

TypeScriptexport function computeRegression(
  formula: Formula,
  dataset: Dataset,
  vcovType?: HCType,
  fixedEffects?: string[],   // NEW — FE column names
): RegressionResult;

When fixedEffects is provided and non-empty:

  1. buildDesignMatrix(formula, dataset) → X, y, columnNames, validRows (unchanged)
  2. Extract FE columns from dataset, filter to validRows, encode as integer group IDs
  3. Build FEDimension[] from the extracted columns
  4. Call demean([...X_columns, y], feDimensions) → demeaned columns
  5. Reconstruct X from demeaned columns, dropping the intercept column
  6. OLS on demeaned data with corrected df: dfResidual = n − kslopes − absorbedDf
  7. R² from demeaned SST/SSR (within R²); F-statistic with corrected df
  8. If vcovType set, robust SEs use demeaned X and corrected dfResidual
  9. Attach fixedEffects: FEInfo[] to result
Intercept absorbed: When FE are present, the intercept is not estimated. hasIntercept is effectively false for the OLS step, even if the formula says hasIntercept: true. Demeaning zeroes out the intercept column, and we drop it.

4. Modify: regression-2sls.ts

compute2SLS() gains the same optional fixedEffects parameter. When present:

  1. Build X (exogenous + endogenous) and Z (exogenous + instruments) as today
  2. Extract FE group IDs, demean X, Z, and y
  3. Drop intercept columns from both X and Z
  4. Run stages 1 and 2 on demeaned data with adjusted dfResidual
  5. First-stage F, Wu-Hausman, and Sargan use demeaned matrices and corrected df

5. Modify: types.ts

TypeScriptexport interface FEInfo {
  name: string;
  nGroups: number;
}

export interface RegressionResult {
  // ... all existing fields unchanged ...
  fixedEffects?: FEInfo[];   // NEW
}

Purely additive — no existing fields change meaning. FEInfo is the canonical definition, imported by demean.ts.

6. Modify: executor.ts

Pass lmNode.params.fixedEffects through to computeRegression and compute2SLS. One extra argument each.

7. Modify: param-schema.ts

Remove the disabled flag from the fixedEffects ParamDef:

Before
disabled: true
disabledReason: 'FE computation not yet supported'
After
disabled: removed
FE becomes a live, editable parameter in the property sheet

8. UI Changes

results-panel.tsx: Remove the yellow “FE not yet computed” warning. Show FE group counts:

state FE — 50 groups
year FE — 10 groups

spec-comparison-view.tsx: Already shows FE checkmarks — no changes needed.

What Doesn't Change

ComponentStatus
Parser / RecognizerFE already extracted from feols(), felm(), fepois(), feglm()
MapperFE already threaded from AnalysisCall.args → LinearModelNode.params
Pipeline typesLinearModelParams.fixedEffects already defined
Group detectorFE already in partition keys
Comparison tableConsumes coefficients unchanged; FE checkmarks already rendered
Spec curveNo changes
Export (LaTeX/CSV)No changes

Edge Cases

1 Single FE level
A FE variable with only 1 unique value is effectively no FE. Detect and skip (or warn).
2 FE variable not in dataset
Throw clear error: “Fixed effect column 'state' not found in dataset”.
3 FE variable is numeric
Treat each unique value as a group. Group encoding works on any value type.
4 Collinear after demeaning
Variable constant within groups → demeaned column is all zeros. OLS detects near-singularity.
5 Very large FE levels
O(n × D × iterations) for demeaning. For n=50K, D=2, iter=20: ~2M ops. Trivially fast.

Testing

R Validation Vectors

Rlibrary(fixest)

# Test 1: 1-way FE
set.seed(42)
d <- data.frame(
  y = rnorm(30), x1 = rnorm(30), x2 = rnorm(30),
  state = rep(c("A","B","C","D","E"), each=6)
)
m1 <- feols(y ~ x1 + x2 | state, data=d)

# Test 2: 2-way FE, balanced
set.seed(123)
d2 <- data.frame(
  y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100),
  state = rep(letters[1:10], 10),
  year  = rep(2010:2019, each=10)
)
m2 <- feols(y ~ x1 + x2 | state + year, data=d2)

# Test 3: unbalanced (connected components)
d3 <- d2[sample(nrow(d2), 70), ]
m3 <- feols(y ~ x1 + x2 | state + year, data=d3)

# Test 4: FE + robust SEs
m4 <- feols(y ~ x1 + x2 | state, data=d, vcov='hetero')

# Test 5: FE + 2SLS
set.seed(99)
z <- rnorm(200)
x_endog <- 0.5*z + rnorm(200, sd=0.5)
fe_group <- rep(1:20, each=10)
fe_effect <- rep(rnorm(20, sd=2), each=10)
y5 <- 1 + 2*x_endog + fe_effect + rnorm(200)
d5 <- data.frame(y=y5, x=x_endog, z=z,
                   g=as.factor(fe_group))
m5 <- feols(y ~ 1 | g | x ~ z, data=d5)

Test Cases

Deferred

File Summary

ActionFileResponsibility
NEWstats/demean.tsdemean() + computeAbsorbedDf()
MODstats/regression.tsAccept fixedEffects, orchestrate demean → OLS
MODstats/regression-2sls.tsAccept fixedEffects, orchestrate demean → 2SLS
MODstats/types.tsAdd FEInfo, fixedEffects on result
MODpipeline/executor.tsPass fixedEffects through
MODpipeline/param-schema.tsRemove disabled flag
MODui/.../results-panel.tsxRemove warning, show FE info