Date: 2026-04-27 Milestone: WebR
follow-up (Session 4 of WebR follow-up sessions)
Predecessors: - 2026-04-20 WebR Integration (Session 1
— typed framework + lm_robust) - 2026-04-22 WebR Opaque
Nodes (Session 2 — opaque path end-to-end)
The opaque-nodes session unlocked paste-as-written for ~70 unique R
functions by emitting webr-opaque nodes for unrecognized
assignments. The single remaining wall is file I/O inside the
WebR worker: read_xlsx("../Data/MasterData.xlsx"),
readRDS("results/m1.rds"),
haven::read_sav("survey.sav"), even
read.csv("foo.csv") on uploaded files all fail with “file
not found” because the WebR worker’s filesystem is empty — uploads only
ever reach the TS-side dataset registry.
Today’s binary-input papers (INTERLYSE-RUN-STATUS papers #2/#3/#4)
require a manual workaround: convert
.xlsx/.rds/.sav/.dta
to CSV outside the app, place the CSV in examples/, and
point the R code at it. This is a hard barrier: any paper using a binary
input format fails on first paste, and there’s no in-app path
forward.
Today’s worker also has no story for files written by
R: ggsave(), write.csv(),
writeLines(), etc. produce files in the worker’s VFS that
die silently when the worker terminates. Replication papers routinely
write tables and figures to disk; users have no way to retrieve
them.
This session bridges both directions. Uploaded files (binary and
otherwise) get mirrored into the WebR worker’s VFS so opaque R code can
read them; files written by the worker get surfaced in an artifacts
panel so users can download them. The is.data.frame probe
added in session 2 already auto-marshals worker results to TS Datasets
when applicable, so the moment read_xlsx(...) succeeds
inside R, the data flows downstream into the existing pipeline
machinery.
A future “editor mode” UI (Scripts/Data side-panels for authoring, separate from replication tree-view) will sit on top of the same workspace store this session establishes; that UI work is split into a follow-up spec.
WorkspaceStore holding path-keyed
Map<string, Uint8Array> for all uploaded files
(regardless of type)..xlsx/.xls/.rds/.rdata/.sav)
extracted from ZIPs into the workspace store (today they’re
discarded).UploadZone.extractFilePath() helper + known-reader registry; produces
referencedFiles: Set<string> on each pipeline
rebuild. Only referenced files get synced to VFS. Includes CSVs —
TS-side data-load consumers don’t trigger sync; only opaque
R code that explicitly does read.csv("foo.csv") does./workspace/<original-zip-relative-path>.WebRRequest carries a
cwd string; worker prefixes eval with
setwd(cwd).originFile threading: AnalysisCall → node
params → WebRRequest.cwd./workspace/; single-file uploads append.originalUploads: Set<string>; after each Run, walk
/workspace/ and surface anything not in the set..xlsx input).WorkspaceView editor UI (Scripts/Data
side-panels for from-scratch authoring) — separate spec, builds on this
workspace store.originFile + existing sourceSpan provide the
data; UI feature is a separate milestone.read_xlsx, read_csv, read.csv,
fread, haven::read_dta,
haven::read_sav, etc.) — promotes them from
webr-opaque to webr-typed with the path as a
typed param. Same shape as every other “additional typed marshaler” in
the WebR backlog; better as a dedicated session that converts all
reliably-data.frame readers in one pass. This session builds the shared
extractFilePath() helper that those marshalers will
reuse.paste0(...)). Defer
until/unless documented breakage shows it’s needed.setwd() emulation (line 72 of BACKLOG)
— the per-script CWD scheme covers automatic CWD-from-origin, but
explicit setwd("subdir") calls in user R code are not yet
rewritten/intercepted. Tracked separately.read.csv("/Users/author/data.csv")) — out of
scope; only relative paths resolve, and absolute paths produce a clear
error.Reset WebR session button
(kill+respawn worker without losing workspace) — would benefit from this
work but is its own UX feature.ggsave("foo.pdf") into /workspace/output/) —
track-by-exclusion handles arbitrary write paths without intercept; no
shim needed.src/ui/store/workspace.ts [new]
Path-keyed Uint8Array store + sync queue;
originalUploads set; lifecycle ops
src/ui/store/files.ts [modified]
Build WorkspaceStore from extracted ZIP;
feed binary files (today discarded)
src/ui/components/toolbar/upload-zone.tsx [modified]
Accept .xlsx/.xls/.rds/.rdata/.sav;
wipe-workspace confirmation
src/core/zip/extractor.ts [modified]
Extract binary file bytes (today excluded);
drop per-file size cap
src/core/parsers/shared/analysis-call.ts [modified]
Add originFile?: string
src/core/parsers/file-registry.ts [modified]
Thread originFile into recognizer call
src/core/parsers/r/recognizer.ts [modified]
Pass originFile through to AnalysisCall;
call extractFilePath during opaque walk;
return referencedFiles in result
src/core/parsers/r/extract-file-path.ts [new]
Shared helper: known-reader registry
+ extractFilePath(call) → string | null;
also reusable by future typed marshalers
src/core/pipeline/types.ts [modified]
Add originFile?: string to webr-typed
and webr-opaque params
src/core/pipeline/mapper.ts [modified]
Carry originFile from AnalysisCall to node
src/core/webr/protocol.ts [modified]
Add cwd?: string to WebRRequest
dispatch-typed and dispatch-opaque
src/core/webr/dispatch.ts [modified]
Plumb cwd through dispatcher API
src/workers/webr-worker.ts [modified]
Handle FS-write requests;
setwd(cwd) before each eval
src/workers/worker-manager.ts [modified]
Sync workspace bytes on worker init;
incrementally sync on new uploads;
compute cwd from originFile;
post-Run artifact discovery
src/ui/store/artifacts.ts [new]
Discovered artifacts; preview cache;
download orchestration
src/ui/components/panels/artifacts-panel.tsx [new]
Collapsible artifacts panel
// src/ui/store/workspace.ts
interface WorkspaceState {
files: Map<string, Uint8Array>; // path → bytes (path-keyed flat map; '/' in keys forms tree)
originalUploads: Set<string>; // paths present after last upload (for artifact diff)
syncedToWebR: Set<string>; // subset of files that have been pushed to /workspace/
totalSize: number; // running sum for the 1.5GB cap
addFiles: (entries: Array<{ path: string; bytes: Uint8Array }>) => void;
wipe: () => Promise<void>; // also wipes WebR /workspace/
removeFile: (path: string) => void;
getPendingSync: () => Array<{ path: string; bytes: Uint8Array }>;
markSynced: (paths: string[]) => void;
markUnsynced: () => void; // called when WebR worker is recreated
}The store is the single source of truth for “what’s in the workspace.” Both the existing TS-side dataset registry (parsed CSVs/DTAs) and the new VFS sync read from it. CSV bytes live here even though parsed Datasets exist elsewhere; with the reference scan in §3.3, CSV bytes are only pushed to VFS if opaque R code references them by path — so the duplication is opt-in, not automatic.
A shared helper extracts file path arguments from known file-reader
function calls. The same helper is reusable by future typed marshalers
(Scenario B in design discussion) — its placement in
src/core/parsers/r/ rather than inside the opaque walker is
deliberate.
// src/core/parsers/r/extract-file-path.ts
import type { FunctionCallNode } from './ast.ts';
// Each entry: function name → which arg holds the path.
// 'name:<arg>' for keyword args; numeric position (0-based) for positional.
const KNOWN_READERS: Record<string, { argName?: string; argPos: number }> = {
// Always-data.frame readers
'read.csv': { argPos: 0, argName: 'file' },
'read.delim': { argPos: 0, argName: 'file' },
'read.table': { argPos: 0, argName: 'file' },
'read.dta': { argPos: 0, argName: 'file' },
'read_csv': { argPos: 0, argName: 'file' },
'read_tsv': { argPos: 0, argName: 'file' },
'read_delim': { argPos: 0, argName: 'file' },
'fread': { argPos: 0, argName: 'input' },
'read_xlsx': { argPos: 0, argName: 'path' },
'read_excel': { argPos: 0, argName: 'path' },
'readxl::read_xlsx': { argPos: 0, argName: 'path' },
'readxl::read_excel': { argPos: 0, argName: 'path' },
'haven::read_dta': { argPos: 0, argName: 'file' },
'haven::read_sav': { argPos: 0, argName: 'file' },
'haven::read_sas': { argPos: 0, argName: 'file' },
// Polymorphic readers (must stay opaque permanently)
'readRDS': { argPos: 0, argName: 'file' },
'readr::read_rds': { argPos: 0, argName: 'file' },
'load': { argPos: 0, argName: 'file' },
};
/** Returns the literal file path arg if `call` is a known reader, else null. */
export function extractFilePath(call: FunctionCallNode): string | null {
const entry = KNOWN_READERS[call.name];
if (!entry) return null;
// Prefer named arg if present
if (entry.argName) {
const named = call.args.find(a => a.name === entry.argName);
if (named?.value.type === 'literal' && typeof named.value.value === 'string') {
return named.value.value;
}
}
// Fall back to positional
const positional = call.args[entry.argPos];
if (positional?.value.type === 'literal' && typeof positional.value.value === 'string') {
return positional.value.value;
}
return null; // Programmatic path (paste0, variable, etc.) — not extractable.
}The recognizer’s binding-walk calls extractFilePath for
every FunctionCallNode it visits during opaque emission.
Hits accumulate into a referencedFiles: Set<string>
returned alongside the existing calls: AnalysisCall[] from
recognizeR(). FileRegistry aggregates per-file
sets into one pipeline-wide set.
Path resolution against the workspace happens in worker-manager (the
recognizer doesn’t know the workspace). Resolution rules: 1. Exact match
against workspace.files keys. 2. Resolve relative to
originFile’s directory (matches the CWD scheme). 3.
Basename match (case-sensitive — Linux semantics inside WebR).
A reference that resolves to a workspace file → that file is added to the sync set. References that don’t resolve are dropped (they’ll fail at R-eval time with a clear file-not-found error, which the user can address by uploading the missing file).
The reference scan is recomputed on every pipeline rebuild (which
already runs on every code edit via setCodeForTab and
loadZip). The cost is O(nodes × known_readers)
map lookups — negligible.
Sync operates on the resolved subset
syncTargets = referencedFiles ∩ workspace.files (resolved
against originFile-relative directories per §3.3). Files in
the workspace that no R code references are never pushed to VFS.
Three triggers, one code path
(workerManager.syncWorkspaceToWebR):
Worker boot (ensureWebRWorker):
after init-ready, compute syncTargets, iterate
the unsynced subset, post FS-write messages before resolving the
webrReady promise. Status stays at loading
until sync completes; only then transitions to ready.
Callers of ensureWebRWorker() can therefore assume the FS
is populated for current syncTargets when the promise
resolves.
Pipeline rebuild adds new references: when
referencedFiles grows (user edits R code to reference a new
file), worker-manager posts FS-write for the newly-referenced files (if
WebR is up). If WebR isn’t up, they’re queued like any other.
New file added while it’s already referenced:
workspace.addFiles(...) notifies worker-manager. If the new
file matches any entry in referencedFiles, post FS-write
immediately. Otherwise no-op until referenced.
Workspace wipe: post fs-wipe;
worker calls webR.FS.unlink() over each path under
/workspace/. After ack, mark all files unsynced; the next
sync trigger re-pushes whatever’s still in
syncTargets.
Protocol additions (all four FS operations consolidated here for
reference; fs-list and fs-read are used in
§3.6):
// src/core/webr/protocol.ts — additions to WebRRequest
| { type: 'fs-write'; id: string; entries: Array<{ path: string; bytes: Uint8Array }> }
| { type: 'fs-wipe'; id: string }
| { type: 'fs-list'; id: string; root: string }
| { type: 'fs-read'; id: string; path: string }
// additions to WebRResponse
| { type: 'fs-ack'; id: string; written?: string[]; error?: string }
| { type: 'fs-list-result'; id: string; entries: Array<{ path: string; size: number; mtime: number }> }
| { type: 'fs-read-result'; id: string; bytes?: Uint8Array; error?: string }For fs-write, the worker creates parent directories as
needed (webR.FS.mkdir recursive), then writes each file
with webR.FS.writeFile. Paths in the request are
workspace-relative (e.g., code/01-prep.R); the worker
prepends /workspace/ to form absolute paths.
The fs-list root argument is
/workspace; the worker walks recursively and returns one
entry per file (not directory). fs-read round-trips bytes
for downloads/previews.
// src/core/parsers/shared/analysis-call.ts — modified
interface AnalysisCall {
// ... existing fields
originFile?: string; // path relative to workspace root, e.g. "code/01-prep.R"
}FileRegistry.processFiles() already iterates per-file;
the recognizer just needs to know the current entry.path
and stamp it onto every emitted call. For inline-paste / single-file
mode where there’s no meaningful “origin file” (just the editor’s tab
content), originFile is left undefined.
// src/core/pipeline/types.ts — extend params
interface WebRTypedParams { /* ... */ originFile?: string; }
interface WebROpaqueParams { /* ... */ originFile?: string; }// src/workers/worker-manager.ts — derive cwd before dispatch
function cwdFor(originFile: string | undefined): string {
if (!originFile) return '/workspace';
const slash = originFile.lastIndexOf('/');
return slash < 0 ? '/workspace' : `/workspace/${originFile.slice(0, slash)}`;
}Each WebRRequest carries cwd; the worker
prefixes eval with setwd(cwd) (sticky — no restoration).
Cost: one extra evalRVoid per dispatch — negligible
compared to the actual eval.
After a Run completes (all in-flight dispatches settled),
worker-manager posts an fs-list request (see protocol
additions in §3.4). The worker walks /workspace/
recursively (webR.FS.readdir + stat for each entry),
returns one entry per file. Worker-manager diffs the result paths
against workspace.originalUploads; any path not in the set
is an artifact. Each artifact in the store carries: path, size, mtime,
mime-type guess (from extension).
The artifacts store keeps the latest snapshot. The UI panel renders it grouped by parent directory; new-since-previous-Run paths get a “new” dot for one render cycle (cleared on next Run start).
Downloads: clicking an artifact triggers an fs-read
request → bytes round-trip back to TS → Blob +
<a download> synthetic link. Previews for
text/CSV/SVG/PNG/PDF under 5MB use the same round-trip but render inline
(text: pre-wrap; SVG: inline; PNG: img src=data URI; CSV: small table
with first 50 rows; PDF: object embed).
UploadZone with a ZIP, when the workspace already has
files: show a modal with “This will replace your current workspace.
Continue?” and Cancel/Replace buttons. On Replace, call
workspace.wipe() (which wipes WebR FS too) before
extracting the new ZIP. On Cancel, abort the upload.
Single-file uploads (any type) skip the prompt and append. Editor mode (future spec) will create files programmatically; same append path.
The originalUploads set is recomputed at the end of
every upload completion (i.e.,
originalUploads = new Set(workspace.files.keys())). After
the wipe-and-replace flow, the new uploads are the new originals;
previously-discovered artifacts are gone (they were in the wiped VFS)
and the artifacts panel resets.
workspace.addFiles(...) synchronously
stages bytes (no VFS write yet).referencedFiles; recompute
syncTargets.ready: post
fs-write for any new entries in syncTargets
not yet synced. No UI status change.loading/installing:
queue; sync runs as part of ensureWebRWorker’s init
tail.onPipelineChange prewarm path triggers boot, and sync runs
as part of that boot’s init tail. If the user’s pipeline never hits a
webr-typed/webr-opaque node, WebR never boots,
and the bytes sit in the TS workspace store unused —
acceptable, costs only what the user uploaded.The key invariant: when webrReady promise resolves,
all current syncTargets are present in
/workspace/. Dispatches downstream of
ensureWebRWorker() await it, so they can safely assume
their inputs are readable. Files outside syncTargets are
never written; if R code at runtime tries to read one, it gets a normal
“file not found” error.
fs-ack carries error; worker-manager
surfaces it as a toast and marks the failed file unsynced. Subsequent
dispatches that depend on the file fail with a “file not found” R error,
which propagates as a normal opaque dispatch error.writeFile overwrites).
Document this as a soft invariant; not enforced.webrReady
promise rejects; status goes to error; user has to
re-upload. Documented limitation; rare.originFile is undefined (single-file
paste, inline editor): cwd = /workspace.analysis.R directly in ZIP root): originFile =
"analysis.R"; lastIndexOf(‘/’) < 0; cwd =
/workspace.originFile = "code/sub/03-merge.R"; cwd =
/workspace/code/sub./workspace/ until the worker dies (page reload) or
workspace is wiped.Steady state per file: - Unreferenced files (most of
workspace.files for typical packages): bytes exist only in
TS-side WorkspaceStore. Not in WebR. One
copy. - Referenced files (the subset R code
actually reads): bytes exist in TS-side WorkspaceStore
and in WebR’s WASM heap simultaneously. Copy semantics (not
Transferable). Two copies.
For a typical 100–200MB upload where ~10–30% of files are R-readable inputs, the extra-vs-today cost is bounded to that ~10–30%, not the full upload. Modern browsers handle this comfortably; the 1.5GB total cap covers it.
Per file budget: none (per-file cap removed). Total cap stays at 1.5GB.
Future optimization (out of scope): switch to
Transferable ArrayBuffer when the dataset-marshal Transferable backlog
item lands. Both paths share the same postMessage envelope
at that point.
[Upload] button — accept attribute extended:
".zip,.R,.r,.csv,.dta,.xlsx,.xls,.rds,.rdata,.sav"
Drop and click handlers route binary types into
workspace.addFiles(...) directly (no parsing) and trigger
sync if WebR is up.
Plain modal, two buttons: - “Replace workspace” (primary, destructive) - “Cancel” (secondary)
Body: lists the files currently in the workspace that will be removed (collapsed to “X files (Y MB)” if >5).
Sits in the right sidebar (alongside the existing properties/results panels), collapsed by default. Header shows artifact count and total size. When expanded:
▼ Artifacts (3 files, 1.2 MB)
output/
▸ tables/main.tex [download]
▸ figs/coef-plot.svg [preview] [download]
results/
▸ m1-summary.csv [preview] [download]
Preview opens an inline overlay (SVG inline; PNG via data URI;
text/CSV with row truncation; PDF via <object>).
Artifacts are orthogonal to typed pipeline outputs; they don’t appear in the DAG. (A future “promote artifact to dataset” feature is out of scope — the auto-marshal probe already handles the case where R loads a file and assigns to a binding, which is the normal data-input path.)
workspace.ts lifecycle
(add/wipe/append, originalUploads recomputation, totalSize accounting);
CWD derivation from originFile (4 cases above).extract-file-path.ts — extracts
literal positional and named args; returns null for programmatic paths
(paste0, variable refs); covers all entries in
KNOWN_READERS.read_xlsx("foo.xlsx") and one CSV-only data-load consumer;
assert referencedFiles = {"foo.xlsx"} (the CSV is
not referenced because it’s consumed via TS-native
data-load)..xlsx and 1 unreferenced
.csv → assert WebR FS contains the .xlsx at
/workspace/... and does not contain the
.csv after ensureWebRWorker() resolves..xlsx, no opaque code; assert no FS
write. Then edit code to add read_xlsx("foo.xlsx"); assert
FS write fires and file is now in WebR FS.read_xlsx("../Data/foo.xlsx") from
originFile = "code/run.R" → assert dispatch succeeds and
result is a marshaled Dataset (uses real WebR worker, gated behind same
env flag as existing webr integration tests).output/x.csv;
run; assert artifact panel store contains output/x.csv
exactly once and not in originalUploads..xlsx input — drag-drop
ZIP, click Run, assert opaque node produces a Dataset and downstream
pipeline executes.webR.FS is part of the standard webr npm
package — no new dependencies.cwd) still work for backward compat
in tests, but production code always sets cwd.AnalysisCall.originFile is optional — existing
recognizer call sites without the field continue to work; only the
FileRegistry walk is updated to populate it.MAX_FILE_SIZE constant is removed from
extractZip; only MAX_TOTAL_DATA_SIZE
remains.unsupportedDataFiles list in
ClassifiedFiles becomes empty (since binaries are now
first-class) — keep the field for protocol compatibility but document
its emptiness.paste0("data/", year, ".csv"), variable-held paths,
setwd("subdir")-relative reads): the literal-arg-only scan
can’t extract these, so the file isn’t synced, and R fails at runtime
with “file not found.” Documented limitation. On-demand sync-on-fail
fallback is listed in Out-of-scope; if usage data shows this biting
frequently, it’s a small follow-up.fs-ack; status goes to error; user
can wipe + re-upload smaller. No silent corruption.output/main-table.tex) for reproducibility
provenance. After Run, R rewrites those paths in VFS — but
track-by-exclusion filters them out of the artifacts panel because the
path was in originalUploads. Bytes are correct in VFS; the
UI just doesn’t surface them. Documented limitation; mtime/hash-based
“modified” detection is listed in Out-of-scope as a future upgrade if
the case proves real.VITE_DISABLE_WEBR=1 env flag as the rest of WebR. With WebR
disabled, the workspace store still exists (CSVs still parse TS-side,
single-file binaries are simply unusable but don’t error on upload —
they sit in the store forever).extractFilePath.setwd("subdir") in user R code → no rewrite this
session.