M3: Weighted Regression — Design Spec

M3: Weighted Regression — Design Spec

Date: 2026-04-04 Milestone: 3 (Core Econometrics) — Important tier Scope: WLS support for all four estimators (OLS, FE-OLS, GLM, 2SLS) with robust and clustered SEs

Motivation

Weighted regression (lm(..., weights=pop), feols(..., weights=~pop)) is common in applied econ, especially in DiD papers where observations represent different population sizes. We already parse these functions but silently ignore the weights argument. This spec adds full WLS execution across every estimator we support.

R Syntax Patterns

# Base R OLS
lm(y ~ x1 + x2, data = df, weights = pop)

# fixest FE-OLS (formula-style ~col or bare col)
feols(y ~ x1 + x2 | state + year, data = df, weights = ~pop)
feols(y ~ x1 + x2 | state + year, data = df, weights = pop)

# fixest IV with weights
feols(y ~ x1 | state | x_endog ~ z_inst, data = df, weights = ~pop)

# lfe
felm(y ~ x1 | fe | 0 | cluster, data = df, weights = pop)

# Base R GLM
glm(y ~ x1 + x2, data = df, family = binomial, weights = n_trials)

# IV
ivreg(y ~ x1 + x2 | z1 + z2, data = df, weights = pop)

All forms produce call.args['weights'] = 'columnName' in the AnalysisCall.

Math

WLS Normal Equations

Unweighted OLS: beta = (X'X)^-1 X'y Weighted (WLS): beta = (X'WX)^-1 X'Wy where W = diag(w_1, ..., w_n)

The existing wlsStep(X, Xt, w, z, n, p) in glm.ts computes exactly (X'WX)^-1 X'Wz. This function is extracted to a shared module and reused by all WLS paths.

Weighted SS Decomposition

ssRes = sum( w_i * e_i^2 )
ssTot = sum( w_i * (y_i - ybar_w)^2 )    when hasIntercept
ssTot = sum( w_i * y_i^2 )                when no intercept

where ybar_w = sum(w_i * y_i) / sum(w_i) is the weighted mean.

R-squared: 1 - ssRes / ssTot (same form, weighted components). sigma-squared: ssRes / dfResidual (same form, weighted ssRes).

Weighted FE Demeaning (FWL)

Alternating projections with weighted group means:

group_mean_g = sum_{i in g}(w_i * x_i) / sum_{i in g}(w_i)

Replaces the current arithmetic mean (sum / count) in demean(). The computeAbsorbedDf() union-find is purely structural and does not change.

Weighted Sandwich SEs

HC robust: The “bread” (X'WX)^-1 is already the WLS inverse (computed during estimation and passed to the sandwich functions as XtXinv). The meat residual weights change:

Leverage under WLS: h_i = w_i * X_i (X'WX)^-1 X_i'

Clustered: Score vectors incorporate weights:

s_g[j] = sum_{i in g}( w_i * X[i][j] * e[i] )

The small-sample correction formula G/(G-1) * (n-1)/(n-k) remains the same (n = number of observations with w > 0).

Weighted 2SLS

Weighted GLM (Prior Weights)

The IRLS loop already computes per-iteration working weights irls_w[i]. Prior (user-supplied) weights multiply in:

w_total[i] = prior_w[i] * irls_w[i]

Deviance contributions: prior_w[i] * devResid(y[i], mu[i]) Null deviance: uses weighted mean ybar_w = sum(prior_w[i] * y[i]) / sum(prior_w[i])

Weight Validation

Layer-by-Layer Changes

1. Recognizer (parsers/r/recognizer.ts)

Extract weights named argument from all recognized functions:

Function Extraction method
lm() getNamedArg(node.args, 'weights')extractRefName()
glm() getNamedArg(node.args, 'weights')extractRefName()
feols() Parse from raw arg text: weights\s*=\s*~?\s*(\w+) (strip ~ prefix)
felm() Parse from raw arg text: same pattern
ivreg() getNamedArg(node.args, 'weights')extractRefName()

All produce call.args['weights'] = 'columnName'.

2. Pipeline Types (pipeline/types.ts)

Add to existing param types:

// LinearModelParams — add:
weights?: string;  // column name

// IVModelParams — add:
weights?: string;  // column name

// GLMParams — add:
weights?: string;  // column name

3. Mapper (pipeline/mapper.ts)

Forward call.args['weights'] into node params for linear-model, iv-model, and glm node creation.

4. Executor (pipeline/executor.ts)

Pass params.weights to computeRegression(), compute2SLS(), and computeGLM().

5. Shared WLS Step

Extract wlsStep from glm.ts into matrix.ts (linear algebra routine, no new file). Signature:

export function wlsStep(
  X: number[][], Xt: number[][], w: number[], z: number[], n: number, p: number
): { beta: number[]; XtWXinv: number[][] }

GLM imports it back. OLS and 2SLS call it when weights are present.

6. Design Matrix (stats/regression.ts)

buildDesignMatrix gains optional weightsCol?: string parameter. When provided:

  1. Extract weight column from dataset
  2. Filter zero-weight and NA-weight rows out of validRows
  3. Return weights?: number[] (aligned with validRows) in DesignMatrixResult
export interface DesignMatrixResult {
  X: number[][];
  y: number[];
  columnNames: string[];
  validRows: number[];
  weights?: number[];  // new — present when weightsCol provided
}

7. OLS Computation (stats/regression.ts)

computeRegression gains optional weightsCol?: string.

When weights present: - Call wlsStep(X, Xt, w, y, n, p) instead of solveAndInverse(XtX, Xty) - Compute weighted residuals: e[i] = y[i] - fitted[i] (same formula, but beta differs) - Weighted SS: ssRes = sum(w[i] * e[i]^2), ssTot = sum(w[i] * (y[i] - ybar_w)^2) - Pass weights to computeRobustVcov and computeClusteredVcov - Include weights: colName in result

8. FE Demeaning (stats/demean.ts)

demean gains optional weights?: number[].

When weights present, replace arithmetic group means with weighted group means: - Pre-allocated counts: Float64Array becomes weightSums: Float64Array - Inner loop: sums[g] += w[i] * col[i], weightSums[g] += w[i] - Subtraction: col[i] -= sums[g] / weightSums[g]

Unweighted path (no weights argument) remains unchanged.

9. 2SLS (stats/regression-2sls.ts)

compute2SLS gains optional weightsCol?: string.

When weights present: - Stage 1: wlsStep(Z, Zt, w, y_endog, ...) for each endogenous variable - Stage 2: wlsStep(X_hat, X_hat_t, w, y, ...) - SE correction: (X_proj' W X_proj)^-1 where X_proj uses original X columns - Wu-Hausman: weighted OLS of augmented regression - Sargan J: weighted regression of residuals on instruments

10. GLM (stats/glm.ts)

computeGLM gains optional weightsCol?: string.

When prior weights present: - IRLS working weights: w[i] = prior_w[i] * (dmu^2 / variance) - Null deviance: ybar = sum(prior_w[i] * y[i]) / sum(prior_w[i]) - Deviance: sum(prior_w[i] * devResid(y[i], mu[i])) - AIC adjustment: uses sum of prior weights

11. Sandwich SEs (stats/sandwich.ts)

computeLeverage gains optional weights?: number[]:

h_i = w_i * X_i (X'WX)^-1 X_i'

computeRobustVcov gains optional weights?: number[]: - HC meat weights: w_i * e_i^2 (times HC-type adjustment) - Delegates to updated computeLeverage for HC2/HC3

computeClusteredVcov gains optional weights?: number[]: - Score accumulation: scores[g][j] += w_i * X[i][j] * e[i]

12. Result Types (stats/types.ts)

Add to RegressionResult and GLMResult:

weights?: string;  // column name used, for display

13. Param Schema (pipeline/param-schema.ts)

Already has a weights ParamDef for linear-model. Extend to iv-model and glm node types.

14. UI

No new components needed. The results panel displays the existing coefficient table — we add a “Weights: colname” line to the result metadata section (alongside “Robust SEs: HC1”, “Fixed Effects: state”, etc.). The param-schema weights entry already enables spec explorer grid display.

Testing Strategy

Unit Tests (R-validated)

Test case R code for expected values
Weighted OLS lm(y ~ x, weights=w) — coefficients, SEs, R-squared, F
Weighted FE-OLS feols(y ~ x \| fe, weights=~w) — coefficients, SEs
Weighted + HC1 coeftest(lm(y ~ x, weights=w), vcov=vcovHC(., type="HC1"))
Weighted + clustered feols(y ~ x \| fe, weights=~w, vcov=~cluster)
Weighted 2SLS ivreg(y ~ x \| z, weights=w) — coefficients, SEs
Weighted GLM glm(y ~ x, family=binomial, weights=n) — coefficients, deviance
Zero weights lm(y ~ x, weights=w) where some w=0 — matches subsetted lm()
Negative weights Error: “Weights must be non-negative”
Weighted demeaning feols(y ~ x \| fe, weights=~w) — verify demeaned values match R

Tolerances: <0.00005 for statistics, <0.00001 for p-values.

Integration Tests

Parse-to-result pipeline tests with realistic R code:

# Weighted DiD pattern
mod <- feols(earnings ~ treatment | state + year, data = df, weights = ~pop)

# Weighted logit
mod <- glm(enrolled ~ income + age, data = df, family = binomial, weights = n)

Recognizer Tests

Verify weights extraction from all supported function forms (6 patterns).

Files Changed

File Change
src/core/stats/types.ts Add weights?: string to RegressionResult, GLMResult
src/core/stats/matrix.ts Extract shared wlsStep
src/core/stats/regression.ts buildDesignMatrix weight extraction + zero-weight filtering; computeRegression WLS path with weighted SS
src/core/stats/regression-2sls.ts compute2SLS WLS at both stages + weighted diagnostics
src/core/stats/glm.ts computeGLM prior weights × IRLS weights; import shared wlsStep
src/core/stats/demean.ts demean weighted group means
src/core/stats/sandwich.ts computeLeverage, computeRobustVcov, computeClusteredVcov weight support
src/core/pipeline/types.ts weights?: string on LinearModelParams, IVModelParams, GLMParams
src/core/pipeline/mapper.ts Forward call.args['weights'] for all three node types
src/core/pipeline/executor.ts Pass params.weights to stats functions
src/core/pipeline/param-schema.ts Add weights ParamDef for iv-model and glm
src/core/parsers/r/recognizer.ts Extract weights from lm, glm, feols, felm, ivreg
src/ui/components/results/ Add “Weights: colname” to result metadata display

Out of Scope