Date: 2026-04-20 Milestone: M6
enabler (pulled forward from M6 Architecture Notes)
Scope: Full framework (Scope C): dispatch
infrastructure, data marshaling, lifecycle, and one typed marshaler
(lm_robust from estimatr). Opaque R-value path
scaffolded in the worker but not wired in the recognizer.
Today, every R function we can’t execute natively in TypeScript
becomes an UnsupportedNode with a warning — a dead end for
replication. The gap analysis
(reference-papers/GAP-ANALYSIS.md) and the 2026-04-18 audit
(REPLICATION-AUDIT.md) both show the long tail —
readRDS(), rdrobust(), custom functions,
obscure packages — has low-ROI individual implementations but high
cumulative cost on coverage.
WebR (R compiled to WebAssembly, maintained by Posit) provides a full
R interpreter in-browser. Of 27 critical econometrics packages, 25 are
available on repo.r-wasm.org. Pulling WebR forward from M6
changes the product story from “88% executable, 12% blocked” to “100%
executable; X% on the TS fast path.”
Framework goal: unsupported calls route to WebR as a universal fallback. As TS implementations land, they replace the WebR path for those specific calls. Recognition (parser) and execution (engine choice) become independent decisions.
src/core/webr/ framework module: protocol types,
TS-side dispatcher, dataset marshaler, per-result-type marshaler
registry.src/workers/webr-worker.ts: dedicated Web Worker
running WebR, handles init, package install, dispatch.WebRTypedNode → worker → marshaled
RegressionResult.lm_robust (estimatr).data = df %>% filter(...),
data = df %>% filter(...) %>% mutate(...), and
data = filter(df, ...) /
data = subset(df, ...) direct-call forms. The inline
expression is recursively recognized and emitted as upstream TS pipeline
node(s); the outer WebR-typed node references the lifted synthetic
binding. Falls back to UnsupportedNode if any stage of the
inline expression is unrecognized.VITE_DISABLE_WEBR=1).rdrobust,
felm, att_gt, …) — added per function,
15-minute increments.data = df[df$year > 2000, ]) — requires
[[/[ RHS subscript parser work (partially
addressed M5d, still incomplete). Falls through to
UnsupportedNode in spike.data = my_custom_func(df)) — needs the opaque path to lift
to a webr-opaque pre-node. Deferred with opaque
support.data = merge(df1, df2, by = 'id')) — rare in practice;
defer.src/core/webr/ [new — framework-free, worker-portable]
├── protocol.ts Message types (WebRRequest, WebRResponse)
├── transport.ts WebRTransport interface (post + subscribe)
├── dispatch.ts createWebRDispatcher(transport) — pure factory
├── dataset-marshal.ts TS Dataset ↔ MarshaledDataset
└── marshalers/
├── registry.ts registerMarshaler(name, fn); lookupMarshaler(name)
└── lm-robust.ts First typed marshaler
src/workers/webr-worker.ts [new — second Web Worker]
Lifecycle, WebR init, package install, dispatch
src/workers/worker-manager.ts [modified]
Construct WebRTransport over a real Worker;
instantiate dispatcher via createWebRDispatcher;
expose webrDispatchTyped() to executor
src/core/pipeline/types.ts [modified]
Add WebRTypedNode to PipelineNode union
Add getPortsFor(node) helper (falls through to NODE_PORTS
for static types; consults marshaler registry for webr-typed)
src/core/pipeline/executor.ts [modified]
Branch on WebRTypedNode → workerManager.webrDispatchTyped()
src/core/parsers/r/recognizer.ts [modified]
Add recognizeLmRobust pattern
src/core/pipeline/mapper.ts [modified]
Add webr-typed AnalysisCall → WebRTypedNode mapping
The existing TS worker stays untouched and instant. The WebR worker is spawned lazily (§5.1), holds ~30MB WASM + installed packages for the session, processes dispatches serially.
src/core/webr/dispatch.ts never references a
Worker directly. It exposes a pure factory:
export interface WebRTransport {
post(msg: WebRRequest): void;
subscribe(handler: (msg: WebRResponse) => void): () => void; // returns unsubscribe
}
export function createWebRDispatcher(transport: WebRTransport) { /* ... */ }The boundary layer (src/workers/worker-manager.ts)
constructs the transport wrapping a real Worker and passes
it in. This keeps all of src/core/webr/ — dispatcher,
protocol types, marshalers, dataset marshaler — fully worker-portable
and testable with in-memory transport mocks (no jsdom
Worker polyfill needed).
The same pattern makes future transports (TCP to an R sidecar,
shared-memory IPC, Node worker_threads) substitutable
without core changes.
[TS worker] [WebR worker]
| |
| execute filter() |
| produces Dataset |
| |
| ── webr dispatch ──────→ | marshal dataset → R data.frame
| (dataset, R src) | install estimatr (first time)
| | execute lm_robust(...)
| | marshal R result → RegressionResult
| ← RegressionResult ────── |
| store on node |
| continue pipeline |
PipelineNode
varianttype WebRTypedNode = {
type: 'webr-typed';
id: string;
span: Span;
params: {
rFunction: string; // 'lm_robust' — drives marshaler + package lookup
rSource: string; // reconstructed R snippet with data arg named explicitly
dataBinding: string; // R variable name used for the data argument
resultSchema: 'regression'; // discriminates marshaler family
};
result?: RegressionResult;
};rSource construction — reconstruct, not pass-throughThe recognizer extracts structured params (formula, se_type,
clusters, weights, etc.) from the AST and reconstructs
rSource as a canonical single call. Chosen over
pass-through because:
df %>% lm_robust(y ~ x)) are desugared
at recognition time anyway, so the call at dispatch is always in
canonical form.Calls with inline-expression data arguments (e.g.,
data = df %>% filter(year > 2000)) are handled
in-scope via Strategy 1 — the inline expression is recursively
recognized and emitted as upstream TS pipeline nodes, with the outer
WebR-typed node referencing a synthetic lifted binding. See §4.3.1. When
the inline expression isn’t recognizable (base-R bracket subset, unknown
function call), the outer call falls back to
UnsupportedNode.
recognizeLmRobust added to the per-function table in
src/core/parsers/r/recognizer.ts. Extracts:
formula via existing
parseFormulaFromArgdata argument via
recognizeDataArgExpression (see §4.3.1 below)se_type string literal (default "HC2" per
estimatr)clusters formula (~var form)weights expression (bare identifier only)ci_level number literalProduces 1 or more AnalysisCalls: zero or more lifted
TS-pipeline calls (from inline data-arg expressions) followed by one
{ kind: 'webr-typed', rFunction: 'lm_robust', dataBinding: <bare or synthetic>, ... }.
recognizeDataArgExpression — inline data arg liftingNew shared helper used by every WebR-typed recognizer (starts with
lm_robust, reusable for all future marshalers):
type DataArgResult =
| { kind: 'binding'; name: string } // data = df
| { kind: 'lifted'; binding: string;
liftedCalls: AnalysisCall[] } // data = df %>% filter(...) or data = filter(df, ...)
| { kind: 'unsupported' }; // falls back to UnsupportedNode
function recognizeDataArgExpression(
expr: AstNode,
scope: RecognizerScope,
): DataArgResult;Supported shapes (by AST pattern):
| Shape | Example | Action |
|---|---|---|
| Bare identifier | data = df |
{ kind: 'binding', name: 'df' } |
| Pipe chain terminated in a recognized TS transform | data = df %>% filter(x > 0) |
Desugar pipe, recurse on each stage, emit data-filter
node with input df and synthetic output binding
__lift_<id>. Return
{ kind: 'lifted', binding: '__lift_<id>', liftedCalls: [filter] }. |
| Multi-stage pipe chain | data = df %>% filter(...) %>% mutate(...) |
Same, N lifted calls chained. |
| Direct call to known TS transform | data = filter(df, x > 0),
data = subset(df, y == 1) |
Synthesize equivalent AnalysisCall for the transform,
output to __lift_<id>. |
| Anything else (bracket subset, unknown function, etc.) | data = df[cond, ], data = my_func(df) |
{ kind: 'unsupported' } → outer call becomes
UnsupportedNode. |
Synthetic binding IDs:
__lift_<nodeId>_<stage> where
nodeId is the outer call’s recognized id and
stage is the pipe-chain position. Deterministic and unique
per outer call — re-recognition of the same source produces identical
bindings (idempotent).
Span semantics: Each lifted call’s
sourceSpan points at the corresponding AST node in the
original source (filter(...) subexpression’s span, not the
whole outer call). This preserves error messages and the properties
panel showing the right source fragment for each lifted node.
Lifted vs user bindings: The mapper treats synthetic
__lift_* bindings identically to user bindings — looks them
up in the binding-name → node-id map. No special case.
If recognizeDataArgExpression returns
{ kind: 'unsupported' }, the outer lm_robust
call becomes an UnsupportedNode — same as if the whole call
were unrecognized. No partial pipeline is emitted. This guarantees we
never run a WebR-typed node with a synthesized data binding that doesn’t
correspond to a real upstream node.
core/pipeline/mapper.ts translates each
AnalysisCall into a PipelineNode:
webr-typed calls produce WebRTypedNode:
rSource:
lm_robust(${formulaText}, data = ${dataBinding}${seTypeArg}${clustersArg}${weightsArg}${ciLevelArg}).
dataBinding is either a user binding (df) or a
synthetic lifted one (__lift_17_0) — the reconstruction is
identical.data port.model produces
RegressionResult.Port shape is intrinsic to each R function: lm_robust
takes data in → produces a model; pivot_longer takes data →
produces data; predict takes model + newdata → produces
data; rbindlist is variadic over datasets; etc. A static
NODE_PORTS['webr-typed'] entry can’t cover all of
these.
Resolution: ports live on the marshaler spec (§6.5),
not on NODE_PORTS. The spec introduces one lookup
helper:
// src/core/pipeline/types.ts
export function getPortsFor(node: PipelineNode): NodePortDef {
if (node.type === 'webr-typed') {
const spec = lookupMarshaler(node.params.rFunction);
return spec?.ports ?? { inputs: {}, outputs: {} };
}
return NODE_PORTS[node.type];
}All existing sites that reference NODE_PORTS[node.type]
for port discovery go through getPortsFor(node) instead.
For every non-WebR node type, behavior is unchanged (static lookup). For
webr-typed, the registry is the source of truth.
For this spike’s single marshaler
(lm_robust):
inputs: { data: { dataType: 'dataset' } }
outputs: { model: { dataType: 'model' } } // RegressionResult includes residuals/fitted as fieldsWhy one model output, not separate
model/residuals/fittedValues
ports: matches our native lm / feols
/ glm nodes, which produce a single result carrying
residuals as fields. Consistency means downstream consumers (spec
explorer, comparison tables) treat WebR-produced models and native
models identically. Users who want residuals or fitted values as
standalone vectors add a post-estimation extraction node (a future
marshaler for residuals(m) or fitted(m)).
Future marshalers that will exercise other shapes: -
pivot_longer / pivot_wider /
distinct: { data → data } -
readRDS:
{ (no inputs, filename param) → data } -
rbindlist, cbind:
{ datasets: variadic → data } - fixef:
{ model → vector } - predict:
{ model, newdata? → data }
Each is one registerMarshaler(spec) call when added.
| Event | When | Blocking? |
|---|---|---|
| Worker spawn | On the first pipeline update in the session containing ≥1 WebR node | Background; UI shows toolbar badge |
| Package collection | Every pipeline update (pure TS function) | N/A (microseconds) |
| Package install | 500ms after pipeline update settles, if collected set has packages
not in installedPackages cache |
Background; UI shows progress |
| Execute | User clicks Run | Awaits any pending spawn/install, then dispatches |
Spawn is eager (background) so cold-start overhead happens while the user is examining the DAG, not after clicking Run. Debounce prevents thrashing during rapid code edits.
Init is one-shot per session. Package install is a separate, independently-triggered pipeline that runs asynchronously whenever the DAG changes — it does not block on Run, and Run does not trigger it (only awaits it if in flight).
WORKER INIT (once per session)
not-spawned → downloading → booting → ready (terminal)
│
↓
(failed) — surfaced as user error; Run attempts retry
PACKAGE INSTALL (many per session, async, triggered by DAG change)
DAG update event
│
↓ (500ms debounce)
collectRequiredPackages(currentPipeline)
│
↓
diff against installedPackages cache
│
├─ diff empty → no-op
└─ diff non-empty → post install-packages(delta)
↓
(worker runs webR.installPackages)
↓
install-complete → cache updated
RUN (user clicks)
Run
├─ await ensureWebRWorker() (usually no-op: already ready)
├─ await any in-flight install (usually no-op: already complete)
└─ dispatch WebR nodes
Singleton Promise<void> on
WorkerManager for worker init; install promises are keyed
by request id and chained so Run can await “whatever’s currently
installing.”
type WebRRequest =
| { type: 'init' }
| { type: 'install-packages'; id: string; packages: string[] }
| {
type: 'dispatch-typed';
id: string;
rSource: string;
inputs: Record<string, MarshaledDataset>;
marshalerName: string;
marshalerContext: MarshalerContext; // e.g., { vcovType: 'HC1' }
}
| {
type: 'dispatch-opaque'; // scaffolded; no TS caller this session
id: string;
rSource: string;
inputs: Record<string, MarshaledDataset>;
resultBinding: string;
};
type WebRResponse =
| { type: 'init-progress'; phase: 'downloading' | 'booting'; bytesLoaded?: number }
| { type: 'init-ready' }
| { type: 'init-error'; error: string }
| { type: 'install-complete'; id: string; installed: string[] }
| { type: 'install-error'; id: string; error: string; failedPackage: string }
| { type: 'dispatch-result'; id: string; result: MarshaledRegression | { kind: 'opaque'; binding: string } }
| { type: 'dispatch-error'; id: string; stage: 'bind-inputs' | 'eval' | 'marshal'; error: string };Every message is a discriminated union on type — matches
the existing TS worker protocol.
Serial within the worker. R is single-threaded, and — crucially — the
worker’s promise chain serializes the full dispatch
cycle as one atomic unit:
bind-inputs → eval → marshal → post result → pop queue.
Dispatch N+1 does not start its bind-inputs stage until
dispatch N’s marshal has returned its result message.
Why full-cycle serialization matters: the R env is shared state. If
dispatch B’s bind-inputs began while dispatch A was still
in the marshal stage, B could overwrite a binding (e.g.,
df) that A’s marshaler is implicitly relying on. (The
result binding __n_A would be safe — R’s copy-on-write
semantics snapshot it — but any
evalR("some_expr_referencing_df") during A’s marshal would
see a mutated df.) Treating bind+eval+marshal as
indivisible eliminates the class.
Parallelism is out of scope. Running two WebR dispatches truly concurrently would require a second WebR worker with its own R env. Could be added later (e.g., for large pipelines with many independent regressions), but forces a significant design shift — multi-worker coordination, per-dispatch env isolation, higher memory cost. For the spike and M6, one worker processing serially is the model.
TS-side concurrent executePipeline: the
existing executionGeneration counter already guards
superseded runs. A second Run-click while WebR dispatches from the first
are still in flight increments the generation; callbacks from dispatch N
see generation !== executionGeneration and skip their
set(). Dispatches continue to completion (can’t cancel
in-flight WebR work without the cancellation story, deferred) but their
results are dropped on the floor — same protection as native
results.
Package install is eager and async, triggered by DAG updates (not by Run). Run just awaits whatever’s in flight.
TS-side state: -
installedPackages: Set<string> — mirrors the worker’s
installed set. Only updated on install-complete from the
worker. - pendingInstall: Promise<void> | null — the
current in-flight install, if any. Serialized: if a new DAG update fires
while an install is running, the next install chains onto the current
one’s completion.
Triggered by DAG update (store subscriber in
src/workers/worker-manager.ts or equivalent hook):
onPipelineChange(pipeline => {
const required = collectRequiredPackages(pipeline); // uses lookupMarshaler
const delta = setDifference(required, installedPackages);
if (delta.size === 0) return; // no-op
pendingInstall = (pendingInstall ?? Promise.resolve()).then(async () => {
await ensureWebRWorker();
const { installed } = await postAndAwait({ type: 'install-packages', packages: [...delta] });
installed.forEach(p => installedPackages.add(p));
});
});Debounced by 500ms against rapid successive DAG updates. Errors on
this path surface as toast notifications (not blocking the UI) and leave
installedPackages unchanged — next install attempt retries
the failed delta.
Triggered by Run (inside
executePipeline):
if (pipelineHasWebRNodes) {
await ensureWebRWorker(); // usually no-op
await (pendingInstall ?? Promise.resolve()); // usually no-op
const stillMissing = difference(collectRequiredPackages(pipeline), installedPackages);
if (stillMissing.size > 0) {
// belt-and-suspenders: DAG update hook may not have fired yet, or prior install failed
await installPackages([...stillMissing]);
}
dispatchWebRNodes();
}Run’s install step is a safety net. In the common case — DAG settled > 500ms before Run is clicked — nothing to do. Handles the edge where Run is clicked immediately after a DAG edit.
Package requirements live on the marshaler spec (§6.5) — one source of truth. Adding a new WebR-typed function never requires touching a separate static map.
No auto-termination. Worker lives for the tab’s lifetime.
WorkerManager.shutdown() (on page unload) calls
worker.terminate(). Session-level caching justifies holding
~30MB.
| Stage | Example | User message | Recovery |
|---|---|---|---|
init |
WebR WASM fetch fails | “R runtime could not be loaded.” | Retry on next Run. |
install |
Package 404 or build error | “Could not install R package <pkg>.” |
Retry. |
bind-inputs |
Marshaler bug | “Could not bind dataset <name> to R
environment.” |
Node-level error; other nodes unaffected. |
eval |
R throws | R’s error message verbatim | Node-level error; downstream skipped. |
marshal |
Unexpected result shape | “Unexpected R result shape for
<marshalerName>.” |
Node-level error. |
Pipeline-level behavior on WebR node error matches existing
UnsupportedNode handling — node marked error, downstream
dependents skipped, independent branches continue.
type MarshaledDataset = {
nrows: number;
columns: Array<
| { name: string; kind: 'numeric'; values: Float64Array } // NaN = NA
| { name: string; kind: 'categorical'; codes: Uint32Array; levels: string[] } // 0xFFFFFFFF = NA
>;
};Traverses postMessage via structured clone (copy).
Transferable-backed zero-copy is a future optimization.
src/core/webr/dataset-marshal.ts:
export function toMarshaled(ds: Dataset): MarshaledDataset;Pulls underlying typed arrays by reference — no allocation or copy in this function.
Worker side:
async function bindDatasetToR(marshaled: MarshaledDataset, rName: string): Promise<void>;Per column: - Numeric →
new webR.RObject.Double(column.values). - Categorical →
factor from levels[codes] with NA for sentinel codes.
Wrap as data.frame(list(...)) and assign to the R global
env under rName.
NumericColumn collapses R’s NA and
NaN into a single NaN. Documented known
lossiness; has not bitten any applied-econ replication to date. If ever
needed, extend NumericColumn with a separate missing
bitmap.
lm_robust result → RegressionResultProduces a RegressionResult conforming to
src/core/stats/types.ts — same shape as our native OLS
results. No new fields; comparison tables, spec curves, and all other
typed consumers work out of the box.
src/core/webr/marshalers/lm-robust.ts:
import type { RegressionResult, CoefficientRow, VcovType } from '../../stats/types';
export async function marshalLmRobust(
webR: WebR,
binding: string,
ctx: { vcovType: VcovType },
): Promise<RegressionResult> {
const terms = await webR.evalR(`names(coef(${binding}))`).then(r => r.toArray() as string[]);
const [coefArr, seArr, tArr, pArr, dfRes, r2, adjR2, fstat, fp, rse, resids, fitted] =
await Promise.all([
webR.evalR(`unname(coef(${binding}))`).then(r => r.toTypedArray()),
webR.evalR(`unname(${binding}$std.error)`).then(r => r.toTypedArray()),
webR.evalR(`unname(${binding}$statistic)`).then(r => r.toTypedArray()),
webR.evalR(`unname(${binding}$p.value)`).then(r => r.toTypedArray()),
webR.evalR(`${binding}$df.residual`).then(r => r.toNumber()),
webR.evalR(`${binding}$r.squared`).then(r => r.toNumber()),
webR.evalR(`${binding}$adj.r.squared`).then(r => r.toNumber()),
webR.evalR(`${binding}$fstatistic$value`).then(r => r.toNumber()),
webR.evalR(`${binding}$fstatistic$p.value`).then(r => r.toNumber()),
webR.evalR(`sqrt(sum(${binding}$res^2) / ${binding}$df.residual)`).then(r => r.toNumber()),
webR.evalR(`unname(${binding}$res)`).then(r => Array.from(r.toTypedArray() as Float64Array)),
webR.evalR(`unname(${binding}$fitted.values)`).then(r => Array.from(r.toTypedArray() as Float64Array)),
]);
const hasIntercept = terms.includes('(Intercept)');
const coefficients: CoefficientRow[] = terms.map((name, i) => ({
name,
estimate: coefArr[i],
standardError: seArr[i],
tStatistic: tArr[i],
pValue: pArr[i],
}));
return {
type: 'regression',
coefficients,
rSquared: r2,
adjustedRSquared: adjR2,
fStatistic: fstat,
fPValue: fp,
dfModel: terms.length - (hasIntercept ? 1 : 0),
dfResidual: dfRes,
residualStandardError: rse,
residuals: resids,
fittedValues: fitted,
vcovType: ctx.vcovType,
};
}Notes on extraction:
unname() before toTypedArray() — R’s
coef() returns a named numeric vector;
toTypedArray wants a plain numeric. Names are reattached on
the TS side from the separately-extracted terms.Promise.all parallelizes ~12 round-trips —
microsecond-scale, cheap.fstatistic$value / fstatistic$p.value —
estimatr’s lm_robust stores these differently from base
lm (which uses a named vector); the marshaler assumes
estimatr’s layout.residuals / fittedValues are required by
RegressionResult. Extracted as full N-vectors; ~N×16 bytes
marshaled per dispatch. Negligible for typical applied econ datasets
(<50K rows).RegressionResult doesn’t carry one in the native path
either (the vcov local in regression.ts is
discarded after computing per-coefficient SEs). Post-estim
coeftest/vcov flows work today by modifying
upstream model params, not by re-using a stored vcov matrix. If that
changes, the marshaler can gain a vcovMatrix?: number[][]
field — same pattern as native regression would.ctx.vcovType — passed in by the dispatcher, sourced
from the recognizer’s extracted se_type arg. Marshaler
doesn’t re-parse the rSource.Each R function we route through WebR-typed has a single registration entry that carries everything the framework needs to integrate it: input/output ports, required R packages, and the marshal function.
// src/core/webr/marshalers/registry.ts
export interface MarshalerContext {
vcovType?: VcovType; // lm_robust
// Extend per future marshaler needs
}
export type Marshaler = (
webR: WebR,
binding: string,
ctx: MarshalerContext,
) => Promise<unknown>;
export interface MarshalerSpec {
rFunction: string; // 'lm_robust'
packages: string[]; // ['estimatr']
ports: {
inputs: Record<string, PortDefinition>;
outputs: Record<string, PortDefinition>;
};
marshal: Marshaler;
}
const registry = new Map<string, MarshalerSpec>();
export function registerMarshaler(spec: MarshalerSpec): void;
export function lookupMarshaler(rFunction: string): MarshalerSpec | undefined;First and only registration in this spike:
// src/core/webr/marshalers/lm-robust.ts (registration block)
registerMarshaler({
rFunction: 'lm_robust',
packages: ['estimatr'],
ports: {
inputs: { data: { dataType: 'dataset' } },
outputs: { model: { dataType: 'model' } },
},
marshal: marshalLmRobust,
});Adding a new WebR-typed function later is one
registerMarshaler({ ... }) call. Framework code is
untouched: port discovery, package collection, and dispatch all consult
the registry.
What previously would have been a separate
WEBR_PACKAGES: Record<string, string[]> static map
now comes from the marshaler spec.
collectRequiredPackages(pipeline) (§5.5) walks WebR nodes
and calls lookupMarshaler(node.params.rFunction).packages
for each — a single source of truth avoids the drift of forgetting to
update one side when adding a new marshaler.
executor: reaches node N (type=webr-typed, rFunction=lm_robust)
│
├─ upstream Dataset already computed on TS side
│
├─ workerManager.webrDispatchTyped({
│ nodeId, rSource, inputs: { [dataBinding]: toMarshaled(ds) },
│ marshalerName: 'lm_robust',
│ marshalerContext: { vcovType: <from recognizer params> },
│ })
│
↓
webr-worker:
├─ stage=bind-inputs: bindDatasetToR(marshaled, dataBinding)
├─ stage=eval: webR.evalR(`__n${id} <- ${rSource}`)
├─ stage=marshal: lookupMarshaler('lm_robust')(webR, `__n${id}`)
└─ postMessage({ type: 'dispatch-result', id, result })
│
↓
workerManager: resolves pending promise
│
↓
executor: sets node.result, marks complete
│
↓
pipeline store: set() with executionGeneration guard
│
↓
UI: re-render, "R" badge on node, comparison table picks up coefficient
__n${id} remains in WebR’s env after dispatch. Enables
future opaque chaining (session 2). For typed-only spike, it’s
future-proofing and allows post-hoc field queries from the UI.
executePipelineExisting executionGeneration counter in
pipeline.ts already guards async callbacks from superseded
runs; WebR dispatches participate identically.
Not supported in this spike. Hard termination via
workerManager.terminateWebR() is the only interrupt. Soft
cancellation via AbortSignal deferred.
If lm_robust fails in WebR (e.g., collinear X), we do
not silently retry in the native OLS path. The user asked for
lm_robust; if it can’t run, node shows error. A future UI
affordance could offer “fall back to TS OLS” manually.
Colocated next to source:
dataset-marshal.test.ts — round-trip, NA/NaN,
categorical levelsdispatch.test.ts — protocol, id correlation, error
propagation (transport mocked with an in-memory implementation — no
Worker polyfill needed)marshalers/lm-robust.test.ts — extractors against
mocked webR.evalR(...) returnsmarshalers/registry.test.ts — register, lookup,
collision; packages and ports fields carried
throughrecognizer/lm-robust-inline-data.test.ts —
inline-expression data args: data = df %>% filter(...),
data = filter(df, ...), multi-stage pipes, fallback to
UnsupportedNode on data = df[cond, ] and
data = my_func(df). Asserts the emitted AnalysisCall
sequence and synthetic binding names.@r-wasm/webr)src/workers/webr-worker.integration.test.ts:
estimatr.lm_robust on synthetic data.RegressionResult shape + field types.Marked test.slow — ~15s first run (download/install),
cached via r-wasm.org CDN on CI.
src/core/webr/marshalers/lm-robust.validation.test.ts:
Four variants (HC0, HC1, HC2, HC3) on synthetic heteroskedastic data (N=500, K=3, seed=42):
applyRobustSE (existing repo
code).lm_robust(y ~ x1 + x2 + x3, data = ds, se_type = 'HC[0-3]').<5e-5); p-values match CLAUDE.md p-value
tolerance (<1e-5).nobs, dfResidual exactly
equal.Passing this proves marshaling bijectivity, formula parity, and result extraction correctness end-to-end.
e2e/webr-lm-robust.spec.ts:
/.examples/synthetic-webr.zip (new fixture: 50-row
CSV + .R with lm_robust call).webr-typed node.No user-facing flag. Lazy architecture gives free opt-out.
Dev-only kill switch:
// src/ui/workers/worker-manager.ts
private webrEnabled = !import.meta.env.VITE_DISABLE_WEBR;VITE_DISABLE_WEBR=1 npm run dev makes
webr-typed nodes behave like
UnsupportedNode.
npm run build && npm test && npm run lint && npm run test:e2e
green.lm_robust example, verify
execution and comparison table.Items captured during brainstorming, not in spike scope:
| Session | Item | Rationale |
|---|---|---|
| 2 | Opaque R-value nodes — recognizer emission | Worker already scaffolded; this wires the recognizer + mapper to
emit webr-opaque for unclassified calls. |
| 2 | Binding-level recognizer walk | Replaces pattern-only recognition; every assignment becomes a node. |
| 2 | Statement-block opaque fallback for parse failures | Extends graceful degradation when individual statements fail to parse. |
| 3 | Broom auto-typing bridge | Run broom::tidy(binding) on opaque results to
auto-materialize as typed Dataset; covers ~200 model classes with zero
per-function code. |
| 3 | UI “cast to regression” inspector | User action that explicitly types an opaque node via
tidy() extraction — Tier 4. |
| Any | Additional typed marshalers | 15 minutes each; priority driven by audit (rdrobust,
felm, att_gt, …). |
| Any | Inline-expression data arguments | Three tiered strategies (lift, opaque, pass-through) documented; pick when needed. |
| M6 | LocalStorage/OPFS package cache | Persist installed packages across sessions — eliminates re-install on page reload. |
| M6 | Transferable-backed zero-copy marshaling | Optimization for large datasets. |
| M6 | Cancellation / AbortSignal | Pre-execution cancellation in the worker. |
Appends to the CLAUDE.md No-Go List:
bind-inputs → eval → marshal → post result runs as one
indivisible unit before the next dispatch begins. The shared R env is
never touched by two in-flight dispatches simultaneously.installedPackages TS-side mirror must stay
consistent with worker state — only the worker’s
install-complete response may update it. Never
pre-populate.webr-typed nodes come from the
marshaler spec — use getPortsFor(node), never
NODE_PORTS['webr-typed'] directly. Adding a new WebR-typed
function is one registerMarshaler({ ... }) call with
packages + ports + marshal.__lift_<outerId>_<stage>.
Must be deterministic so re-recognition of the same source is idempotent
and the mapper resolves edges to the same upstream nodes.unname() vectors before
toTypedArray() — names are attached on the TS side
from a separately-queried terms array.executionGeneration guard must
wrap WebR result handlers — same stale-execution protection as
native results.None blocking. One note on package-version pinning: the spike uses
whatever r-wasm.org serves for estimatr
(latest at install time). If a future version of estimatr changes a
result field (e.g., p.value.adjusted rename), the marshaler
breaks and the validation test catches it. Pinning versions is an M6
concern tied to renv.lock support.