Date: 2026-04-04 Milestone: 3 (Core Econometrics) — Important tier Scope: WLS support for all four estimators (OLS, FE-OLS, GLM, 2SLS) with robust and clustered SEs
Weighted regression (lm(..., weights=pop),
feols(..., weights=~pop)) is common in applied econ,
especially in DiD papers where observations represent different
population sizes. We already parse these functions but silently ignore
the weights argument. This spec adds full WLS execution
across every estimator we support.
# Base R OLS
lm(y ~ x1 + x2, data = df, weights = pop)
# fixest FE-OLS (formula-style ~col or bare col)
feols(y ~ x1 + x2 | state + year, data = df, weights = ~pop)
feols(y ~ x1 + x2 | state + year, data = df, weights = pop)
# fixest IV with weights
feols(y ~ x1 | state | x_endog ~ z_inst, data = df, weights = ~pop)
# lfe
felm(y ~ x1 | fe | 0 | cluster, data = df, weights = pop)
# Base R GLM
glm(y ~ x1 + x2, data = df, family = binomial, weights = n_trials)
# IV
ivreg(y ~ x1 + x2 | z1 + z2, data = df, weights = pop)All forms produce call.args['weights'] = 'columnName' in
the AnalysisCall.
Unweighted OLS: beta = (X'X)^-1 X'y Weighted (WLS):
beta = (X'WX)^-1 X'Wy where
W = diag(w_1, ..., w_n)
The existing wlsStep(X, Xt, w, z, n, p) in
glm.ts computes exactly (X'WX)^-1 X'Wz. This
function is extracted to a shared module and reused by all WLS
paths.
ssRes = sum( w_i * e_i^2 )
ssTot = sum( w_i * (y_i - ybar_w)^2 ) when hasIntercept
ssTot = sum( w_i * y_i^2 ) when no intercept
where ybar_w = sum(w_i * y_i) / sum(w_i) is the weighted
mean.
R-squared: 1 - ssRes / ssTot (same form, weighted
components). sigma-squared: ssRes / dfResidual (same form,
weighted ssRes).
Alternating projections with weighted group means:
group_mean_g = sum_{i in g}(w_i * x_i) / sum_{i in g}(w_i)
Replaces the current arithmetic mean (sum / count) in
demean(). The computeAbsorbedDf() union-find
is purely structural and does not change.
HC robust: The “bread” (X'WX)^-1 is
already the WLS inverse (computed during estimation and passed to the
sandwich functions as XtXinv). The meat residual weights
change:
meat_weight_i = w_i * e_i^2meat_weight_i = w_i * e_i^2 * n/(n-k)meat_weight_i = w_i * e_i^2 / (1 - h_i)meat_weight_i = w_i * e_i^2 / (1 - h_i)^2Leverage under WLS: h_i = w_i * X_i (X'WX)^-1 X_i'
Clustered: Score vectors incorporate weights:
s_g[j] = sum_{i in g}( w_i * X[i][j] * e[i] )
The small-sample correction formula
G/(G-1) * (n-1)/(n-k) remains the same (n = number of
observations with w > 0).
gamma = (Z'WZ)^-1 Z'Wy_endogbeta = (X_hat'WX_hat)^-1 X_hat'WyV = (X_proj'WX_proj)^-1The IRLS loop already computes per-iteration working weights
irls_w[i]. Prior (user-supplied) weights multiply in:
w_total[i] = prior_w[i] * irls_w[i]
Deviance contributions:
prior_w[i] * devResid(y[i], mu[i]) Null deviance: uses
weighted mean
ybar_w = sum(prior_w[i] * y[i]) / sum(prior_w[i])
validRows). This matches R’s
lm() behavior — zero-weight rows are effectively
dropped.buildDesignMatrix.parsers/r/recognizer.ts)Extract weights named argument from all recognized
functions:
| Function | Extraction method |
|---|---|
lm() |
getNamedArg(node.args, 'weights') →
extractRefName() |
glm() |
getNamedArg(node.args, 'weights') →
extractRefName() |
feols() |
Parse from raw arg text: weights\s*=\s*~?\s*(\w+)
(strip ~ prefix) |
felm() |
Parse from raw arg text: same pattern |
ivreg() |
getNamedArg(node.args, 'weights') →
extractRefName() |
All produce call.args['weights'] = 'columnName'.
pipeline/types.ts)Add to existing param types:
// LinearModelParams — add:
weights?: string; // column name
// IVModelParams — add:
weights?: string; // column name
// GLMParams — add:
weights?: string; // column namepipeline/mapper.ts)Forward call.args['weights'] into node params for
linear-model, iv-model, and glm
node creation.
pipeline/executor.ts)Pass params.weights to computeRegression(),
compute2SLS(), and computeGLM().
Extract wlsStep from glm.ts into
matrix.ts (linear algebra routine, no new file).
Signature:
export function wlsStep(
X: number[][], Xt: number[][], w: number[], z: number[], n: number, p: number
): { beta: number[]; XtWXinv: number[][] }GLM imports it back. OLS and 2SLS call it when weights are present.
stats/regression.ts)buildDesignMatrix gains optional
weightsCol?: string parameter. When provided:
validRowsweights?: number[] (aligned with validRows) in
DesignMatrixResultexport interface DesignMatrixResult {
X: number[][];
y: number[];
columnNames: string[];
validRows: number[];
weights?: number[]; // new — present when weightsCol provided
}stats/regression.ts)computeRegression gains optional
weightsCol?: string.
When weights present: - Call wlsStep(X, Xt, w, y, n, p)
instead of solveAndInverse(XtX, Xty) - Compute weighted
residuals: e[i] = y[i] - fitted[i] (same formula, but beta
differs) - Weighted SS: ssRes = sum(w[i] * e[i]^2),
ssTot = sum(w[i] * (y[i] - ybar_w)^2) - Pass weights to
computeRobustVcov and computeClusteredVcov -
Include weights: colName in result
stats/demean.ts)demean gains optional
weights?: number[].
When weights present, replace arithmetic group means with weighted
group means: - Pre-allocated counts: Float64Array becomes
weightSums: Float64Array - Inner loop:
sums[g] += w[i] * col[i],
weightSums[g] += w[i] - Subtraction:
col[i] -= sums[g] / weightSums[g]
Unweighted path (no weights argument) remains unchanged.
stats/regression-2sls.ts)compute2SLS gains optional
weightsCol?: string.
When weights present: - Stage 1:
wlsStep(Z, Zt, w, y_endog, ...) for each endogenous
variable - Stage 2: wlsStep(X_hat, X_hat_t, w, y, ...) - SE
correction: (X_proj' W X_proj)^-1 where X_proj uses
original X columns - Wu-Hausman: weighted OLS of augmented regression -
Sargan J: weighted regression of residuals on instruments
stats/glm.ts)computeGLM gains optional
weightsCol?: string.
When prior weights present: - IRLS working weights:
w[i] = prior_w[i] * (dmu^2 / variance) - Null deviance:
ybar = sum(prior_w[i] * y[i]) / sum(prior_w[i]) - Deviance:
sum(prior_w[i] * devResid(y[i], mu[i])) - AIC adjustment:
uses sum of prior weights
stats/sandwich.ts)computeLeverage gains optional
weights?: number[]:
h_i = w_i * X_i (X'WX)^-1 X_i'
computeRobustVcov gains optional
weights?: number[]: - HC meat weights:
w_i * e_i^2 (times HC-type adjustment) - Delegates to
updated computeLeverage for HC2/HC3
computeClusteredVcov gains optional
weights?: number[]: - Score accumulation:
scores[g][j] += w_i * X[i][j] * e[i]
stats/types.ts)Add to RegressionResult and GLMResult:
weights?: string; // column name used, for displaypipeline/param-schema.ts)Already has a weights ParamDef for
linear-model. Extend to iv-model and
glm node types.
No new components needed. The results panel displays the existing
coefficient table — we add a “Weights: colname” line to the result
metadata section (alongside “Robust SEs: HC1”, “Fixed Effects: state”,
etc.). The param-schema weights entry already enables spec
explorer grid display.
| Test case | R code for expected values |
|---|---|
| Weighted OLS | lm(y ~ x, weights=w) — coefficients, SEs, R-squared,
F |
| Weighted FE-OLS | feols(y ~ x \| fe, weights=~w) — coefficients, SEs |
| Weighted + HC1 | coeftest(lm(y ~ x, weights=w), vcov=vcovHC(., type="HC1")) |
| Weighted + clustered | feols(y ~ x \| fe, weights=~w, vcov=~cluster) |
| Weighted 2SLS | ivreg(y ~ x \| z, weights=w) — coefficients, SEs |
| Weighted GLM | glm(y ~ x, family=binomial, weights=n) — coefficients,
deviance |
| Zero weights | lm(y ~ x, weights=w) where some w=0 —
matches subsetted lm() |
| Negative weights | Error: “Weights must be non-negative” |
| Weighted demeaning | feols(y ~ x \| fe, weights=~w) — verify demeaned values
match R |
Tolerances: <0.00005 for statistics, <0.00001 for p-values.
Parse-to-result pipeline tests with realistic R code:
# Weighted DiD pattern
mod <- feols(earnings ~ treatment | state + year, data = df, weights = ~pop)
# Weighted logit
mod <- glm(enrolled ~ income + age, data = df, family = binomial, weights = n)Verify weights extraction from all supported function
forms (6 patterns).
| File | Change |
|---|---|
src/core/stats/types.ts |
Add weights?: string to RegressionResult,
GLMResult |
src/core/stats/matrix.ts |
Extract shared wlsStep |
src/core/stats/regression.ts |
buildDesignMatrix weight extraction + zero-weight
filtering; computeRegression WLS path with weighted SS |
src/core/stats/regression-2sls.ts |
compute2SLS WLS at both stages + weighted
diagnostics |
src/core/stats/glm.ts |
computeGLM prior weights × IRLS weights; import shared
wlsStep |
src/core/stats/demean.ts |
demean weighted group means |
src/core/stats/sandwich.ts |
computeLeverage, computeRobustVcov,
computeClusteredVcov weight support |
src/core/pipeline/types.ts |
weights?: string on LinearModelParams,
IVModelParams, GLMParams |
src/core/pipeline/mapper.ts |
Forward call.args['weights'] for all three node
types |
src/core/pipeline/executor.ts |
Pass params.weights to stats functions |
src/core/pipeline/param-schema.ts |
Add weights ParamDef for iv-model and
glm |
src/core/parsers/r/recognizer.ts |
Extract weights from lm, glm,
feols, felm, ivreg |
src/ui/components/results/ |
Add “Weights: colname” to result metadata display |
weights argument in lm() is analytic
(inverse-variance) weights. glm() uses prior weights. We
follow R’s semantics per function — no additional weight-type
parameter.survey package: svyglm(),
svydesign() — complex survey weights with design effects.
Separate feature.