WebR VFS Bridge — Design Spec

WebR VFS Bridge — Design Spec

Date: 2026-04-27 Milestone: WebR follow-up (Session 4 of WebR follow-up sessions) Predecessors: - 2026-04-20 WebR Integration (Session 1 — typed framework + lm_robust) - 2026-04-22 WebR Opaque Nodes (Session 2 — opaque path end-to-end)

1. Context

The opaque-nodes session unlocked paste-as-written for ~70 unique R functions by emitting webr-opaque nodes for unrecognized assignments. The single remaining wall is file I/O inside the WebR worker: read_xlsx("../Data/MasterData.xlsx"), readRDS("results/m1.rds"), haven::read_sav("survey.sav"), even read.csv("foo.csv") on uploaded files all fail with “file not found” because the WebR worker’s filesystem is empty — uploads only ever reach the TS-side dataset registry.

Today’s binary-input papers (INTERLYSE-RUN-STATUS papers #2/#3/#4) require a manual workaround: convert .xlsx/.rds/.sav/.dta to CSV outside the app, place the CSV in examples/, and point the R code at it. This is a hard barrier: any paper using a binary input format fails on first paste, and there’s no in-app path forward.

Today’s worker also has no story for files written by R: ggsave(), write.csv(), writeLines(), etc. produce files in the worker’s VFS that die silently when the worker terminates. Replication papers routinely write tables and figures to disk; users have no way to retrieve them.

This session bridges both directions. Uploaded files (binary and otherwise) get mirrored into the WebR worker’s VFS so opaque R code can read them; files written by the worker get surfaced in an artifacts panel so users can download them. The is.data.frame probe added in session 2 already auto-marshals worker results to TS Datasets when applicable, so the moment read_xlsx(...) succeeds inside R, the data flows downstream into the existing pipeline machinery.

A future “editor mode” UI (Scripts/Data side-panels for authoring, separate from replication tree-view) will sit on top of the same workspace store this session establishes; that UI work is split into a follow-up spec.

2. Scope

In

Out (future sessions)

3. Architecture

3.1 Module changes

src/ui/store/workspace.ts                    [new]
                                              Path-keyed Uint8Array store + sync queue;
                                              originalUploads set; lifecycle ops

src/ui/store/files.ts                         [modified]
                                              Build WorkspaceStore from extracted ZIP;
                                              feed binary files (today discarded)

src/ui/components/toolbar/upload-zone.tsx    [modified]
                                              Accept .xlsx/.xls/.rds/.rdata/.sav;
                                              wipe-workspace confirmation

src/core/zip/extractor.ts                     [modified]
                                              Extract binary file bytes (today excluded);
                                              drop per-file size cap

src/core/parsers/shared/analysis-call.ts     [modified]
                                              Add originFile?: string

src/core/parsers/file-registry.ts             [modified]
                                              Thread originFile into recognizer call

src/core/parsers/r/recognizer.ts              [modified]
                                              Pass originFile through to AnalysisCall

src/core/pipeline/types.ts                    [modified]
                                              Add originFile?: string to webr-typed
                                              and webr-opaque params

src/core/pipeline/mapper.ts                   [modified]
                                              Carry originFile from AnalysisCall to node

src/core/webr/protocol.ts                     [modified]
                                              Add cwd?: string to WebRRequest
                                              dispatch-typed and dispatch-opaque

src/core/webr/dispatch.ts                     [modified]
                                              Plumb cwd through dispatcher API

src/workers/webr-worker.ts                   [modified]
                                              Handle FS-write requests;
                                              setwd(cwd) before each eval

src/workers/worker-manager.ts                [modified]
                                              Sync workspace bytes on worker init;
                                              incrementally sync on new uploads;
                                              compute cwd from originFile;
                                              post-Run artifact discovery

src/ui/store/artifacts.ts                     [new]
                                              Discovered artifacts; preview cache;
                                              download orchestration

src/ui/components/panels/artifacts-panel.tsx [new]
                                              Collapsible artifacts panel

3.2 WorkspaceStore

// src/ui/store/workspace.ts
interface WorkspaceState {
  files: Map<string, Uint8Array>;         // path → bytes (path-keyed flat map; '/' in keys forms tree)
  originalUploads: Set<string>;           // paths present after last upload (for artifact diff)
  syncedToWebR: Set<string>;              // subset of files that have been pushed to /workspace/
  totalSize: number;                       // running sum for the 1.5GB cap

  addFiles: (entries: Array<{ path: string; bytes: Uint8Array }>) => void;
  wipe: () => Promise<void>;               // also wipes WebR /workspace/
  removeFile: (path: string) => void;
  getPendingSync: () => Array<{ path: string; bytes: Uint8Array }>;
  markSynced: (paths: string[]) => void;
  markUnsynced: () => void;                // called when WebR worker is recreated
}

The store is the single source of truth for “what’s in the workspace.” Both the existing TS-side dataset registry (parsed CSVs/DTAs) and the new VFS sync read from it. CSV bytes live here even though parsed Datasets exist elsewhere — the duplication is bounded (CSV bytes only, not parsed columns) and lets opaque R code resolve read.csv("foo.csv") without surprise.

3.3 VFS sync pipeline

Three triggers, one code path (workerManager.syncWorkspaceToWebR):

  1. Worker boot (ensureWebRWorker): after init-ready, iterate workspace.getPendingSync() and post FS-write messages before resolving the webrReady promise. Status stays at loading until sync completes; only then transitions to ready. Callers of ensureWebRWorker() can therefore assume the FS is populated when the promise resolves.

  2. New file added while worker is up: workspace.addFiles(...) notifies the worker-manager, which posts FS-write messages immediately. No status change (worker stays ready).

  3. Workspace wipe: post a wipe-workspace message; worker calls webR.FS.unlink() over each path under /workspace/. After ack, mark all files unsynced; subsequent ensureWebRWorker/addFiles calls will re-sync.

Protocol additions (all four FS operations consolidated here for reference; fs-list and fs-read are used in §3.5):

// src/core/webr/protocol.ts — additions to WebRRequest
| { type: 'fs-write'; id: string; entries: Array<{ path: string; bytes: Uint8Array }> }
| { type: 'fs-wipe';  id: string }
| { type: 'fs-list';  id: string; root: string }
| { type: 'fs-read';  id: string; path: string }

// additions to WebRResponse
| { type: 'fs-ack';         id: string; written?: string[]; error?: string }
| { type: 'fs-list-result'; id: string; entries: Array<{ path: string; size: number; mtime: number }> }
| { type: 'fs-read-result'; id: string; bytes?: Uint8Array; error?: string }

For fs-write, the worker creates parent directories as needed (webR.FS.mkdir recursive), then writes each file with webR.FS.writeFile. Paths in the request are workspace-relative (e.g., code/01-prep.R); the worker prepends /workspace/ to form absolute paths.

The fs-list root argument is /workspace; the worker walks recursively and returns one entry per file (not directory). fs-read round-trips bytes for downloads/previews.

3.4 CWD threading

// src/core/parsers/shared/analysis-call.ts — modified
interface AnalysisCall {
  // ... existing fields
  originFile?: string;  // path relative to workspace root, e.g. "code/01-prep.R"
}

FileRegistry.processFiles() already iterates per-file; the recognizer just needs to know the current entry.path and stamp it onto every emitted call. For inline-paste / single-file mode where there’s no meaningful “origin file” (just the editor’s tab content), originFile is left undefined.

// src/core/pipeline/types.ts — extend params
interface WebRTypedParams { /* ... */ originFile?: string; }
interface WebROpaqueParams { /* ... */ originFile?: string; }
// src/workers/worker-manager.ts — derive cwd before dispatch
function cwdFor(originFile: string | undefined): string {
  if (!originFile) return '/workspace';
  const slash = originFile.lastIndexOf('/');
  return slash < 0 ? '/workspace' : `/workspace/${originFile.slice(0, slash)}`;
}

Each WebRRequest carries cwd; the worker prefixes eval with setwd(cwd) (sticky — no restoration). Cost: one extra evalRVoid per dispatch — negligible compared to the actual eval.

3.5 Artifact discovery

After a Run completes (all in-flight dispatches settled), worker-manager posts an fs-list request (see protocol additions in §3.3). The worker walks /workspace/ recursively (webR.FS.readdir + stat for each entry), returns one entry per file. Worker-manager diffs the result paths against workspace.originalUploads; any path not in the set is an artifact. Each artifact in the store carries: path, size, mtime, mime-type guess (from extension).

The artifacts store keeps the latest snapshot. The UI panel renders it grouped by parent directory; new-since-previous-Run paths get a “new” dot for one render cycle (cleared on next Run start).

Downloads: clicking an artifact triggers an fs-read request → bytes round-trip back to TS → Blob + <a download> synthetic link. Previews for text/CSV/SVG/PNG/PDF under 5MB use the same round-trip but render inline (text: pre-wrap; SVG: inline; PNG: img src=data URI; CSV: small table with first 50 rows; PDF: object embed).

3.6 Lifecycle & wipe-confirmation

UploadZone with a ZIP, when the workspace already has files: show a modal with “This will replace your current workspace. Continue?” and Cancel/Replace buttons. On Replace, call workspace.wipe() (which wipes WebR FS too) before extracting the new ZIP. On Cancel, abort the upload.

Single-file uploads (any type) skip the prompt and append. Editor mode (future spec) will create files programmatically; same append path.

The originalUploads set is recomputed at the end of every upload completion (i.e., originalUploads = new Set(workspace.files.keys())). After the wipe-and-replace flow, the new uploads are the new originals; previously-discovered artifacts are gone (they were in the wiped VFS) and the artifacts panel resets.

4. Behavior

4.1 Sync timing

The key invariant: when webrReady promise resolves, all currently-staged files are present in /workspace/. Dispatches downstream of ensureWebRWorker() await it, so they can safely assume their inputs are readable.

4.2 Error handling

4.3 CWD edge cases

4.4 Artifact lifecycle

5. Memory model

Steady state per file: bytes exist in TS-side WorkspaceStore and in WebR’s WASM heap simultaneously (Copy semantics, not Transferable). For typical replication packages with 50–200MB of data files, this is 50–200MB extra vs. today. Modern browsers handle this comfortably; the existing 1.5GB total cap covers it.

Per file budget: none (per-file cap removed). Total cap stays at 1.5GB.

Future optimization (out of scope): switch to Transferable ArrayBuffer when the dataset-marshal Transferable backlog item lands. Both paths share the same postMessage envelope at that point.

6. UI

6.1 Upload zone

[Upload] button — accept attribute extended:
  ".zip,.R,.r,.csv,.dta,.xlsx,.xls,.rds,.rdata,.sav"

Drop and click handlers route binary types into workspace.addFiles(...) directly (no parsing) and trigger sync if WebR is up.

6.2 Wipe-confirmation modal

Plain modal, two buttons: - “Replace workspace” (primary, destructive) - “Cancel” (secondary)

Body: lists the files currently in the workspace that will be removed (collapsed to “X files (Y MB)” if >5).

6.3 Artifacts panel

Sits in the right sidebar (alongside the existing properties/results panels), collapsed by default. Header shows artifact count and total size. When expanded:

▼ Artifacts (3 files, 1.2 MB)
  output/
    ▸ tables/main.tex      [download]
    ▸ figs/coef-plot.svg   [preview] [download]
  results/
    ▸ m1-summary.csv       [preview] [download]

Preview opens an inline overlay (SVG inline; PNG via data URI; text/CSV with row truncation; PDF via <object>).

6.4 No changes to results / pipeline panels

Artifacts are orthogonal to typed pipeline outputs; they don’t appear in the DAG. (A future “promote artifact to dataset” feature is out of scope — the auto-marshal probe already handles the case where R loads a file and assigns to a binding, which is the normal data-input path.)

7. Testing

8. Migration notes

9. Risk & rollback

10. Out of scope reminders