Date: 2026-04-14 Goal: A developer-facing metric system that tracks how much of each paper’s estimation pipeline Interlyse can execute, with dependency-aware blocker drill-down to prioritize feature development.
The atomic unit is a model node — any recognized
estimation call (lm(), feols(),
felm(), glm(), ivreg(),
lm_robust(), rq(), etc.) that produces
regression/test results. Table output (stargazer, etable) is a
downstream presentation concern tracked separately later.
Each model node has an upstream dependency chain in the pipeline DAG: data-load -> transforms -> … -> model. A model is only executable if every node in its chain is supported.
Static analysis — parses R code, builds the DAG, classifies every node as supported or blocked. Runs on the full corpus (all downloaded packages, no data files needed). Answers: “what could work?”
Execution verification — actually runs the pipeline with real data, compares TS output against R output. Runs on a curated subset where data is staged. Answers: “what actually works?”
A model can be executable but not verified (no data staged), or blocked but partially informative (some upstream nodes work, the blocker is identified).
type ModelStatus =
| { status: 'blocked'; blockers: BlockerInfo[] }
| { status: 'executable' }
| { status: 'verified'; match: boolean; divergence?: DivergenceInfo }
type BlockerInfo = {
feature: string // e.g., "pivot_wider", "read_xlsx", "setwd"
nodeType: string // pipeline node type that's unsupported
nodeLabel: string // human-readable label from the R source
upstreamOf: string[] // which model nodes this blocks
}
type DivergenceInfo = {
coefficient: string // which coef diverged
tsValue: number
rValue: number
delta: number
}type PaperScorecard = {
paperId: string // e.g., "qje-investor-memory"
journal: string
rFiles: number
modelsDetected: number
modelsExecutable: number // static analysis
modelsVerified: number // execution pass (0 if not staged)
modelsVerifiedPassing: number
dataStaged: boolean
blockers: Record<string, { // keyed by feature name
modelsBlocked: number
modelLabels: string[]
}>
}type CorpusRollup = {
timestamp: string
totalPapers: number
totalModels: number
totalExecutable: number
totalVerified: number
totalVerifiedPassing: number
papersFullyExecutable: number
featurePriority: {
feature: string
modelsUnblocked: number
papersAffected: number
}[]
paperScorecards: PaperScorecard[]
}A Node CLI script
(scripts/audit-static.ts) that reuses the actual Interlyse
parser pipeline:
R files -> parseR() -> recognizeR() -> mapToPipeline() -> DAG
|
classify each node
|
walk back from models
|
scorecard JSON
This reuses the real parser/recognizer/mapper so “is this supported?” stays in one place. No duplicated classification logic in Python.
The script doesn’t need a separate manifest — it uses the actual
pipeline mapper and executor registry. If mapToPipeline()
produces a node and executorRegistry has a handler for it,
it’s supported. If the recognizer doesn’t recognize a function call,
it’s unsupported. The source of truth is the code itself.
For function calls that the recognizer skips (not recognized), the script captures them as potential blockers by diffing “all function calls in the AST” against “function calls that produced pipeline nodes.”
For each model node in the DAG: 1. Walk the dependency chain backward
(follow input edges) 2. For each upstream node, check: does an executor
exist for this node type? 3. For unrecognized function calls that appear
in the R source between the data-load statement and the model call (by
source position / Span), flag them as potential blockers —
these likely represent transform steps the recognizer missed 4. The
primary blocker is the first unsupported node in
topological order (closest to the data source) — fixing it might unblock
a cascade 5. Secondary blockers are additional
unsupported nodes further downstream — even after fixing the primary,
these still need work
Some models have no recognizable upstream data-load — the paper uses
attach(), pre-loaded objects, or data constructed entirely
in custom functions. These models are classified as
blocked with a synthetic blocker
"data-source-unknown". They can transition to executable
once the data path is resolved (e.g., by staging data and mapping it
manually to the model’s expected variables).
Papers with source() calls use the existing FileRegistry
+ topological sort (M5a). The audit script processes all R files in a
package in dependency order, accumulating scope, just like the real
pipeline would.
# Audit all downloaded packages
node scripts/audit-static.js
# Audit a single paper
node scripts/audit-static.js --paper qje-investor-memory
# Output: reference-papers/audit/static-results.jsonA test harness
(scripts/audit-verify.ts) for papers with staged data:
Paper R code -> local R process -> ground-truth JSON (coefficients per model)
Paper R code -> Interlyse TS pipeline -> TS results (coefficients per model)
|
diff TS vs R
|
verification JSON
For each staged paper, a small R wrapper script: 1.
source()s the paper’s R code 2. Intercepts estimation
function calls (wraps lm, feols, etc.) 3. For
each model, extracts: coefficients, standard errors, p-values,
R-squared, N, residual df 4. Writes structured JSON:
reference-papers/audit/r-ground-truth/<paper-id>.json
This R wrapper is semi-automated — the static analysis tells us which
functions appear and how many models to expect. The wrapper template
handles the common cases (lm, feols,
felm, glm, ivreg). Papers with
unusual patterns may need manual adjustment.
The verify script: 1. Loads the paper’s data files into Interlyse
Dataset objects 2. Parses the R code through the full
pipeline (parser -> recognizer -> mapper -> executor) 3.
Extracts the same coefficient/SE/p-value values from each model node’s
result 4. Writes
reference-papers/audit/ts-results/<paper-id>.json
Tolerances (matching existing test conventions): -
Coefficients/statistics: |ts - r| < 0.00005 - P-values:
|ts - r| < 0.00001 - N, df: exact match
A model is verified passing when all extracted values match. A model is verified failing when it executes but values diverge — the divergence info captures which coefficient, by how much.
Staged papers live in
reference-papers/audit/data/<paper-id>/ with the
CSV/DTA files needed for execution. A
reference-papers/audit/staged-papers.json manifest tracks
which papers are staged and ready for verification.
# Verify all staged papers
node scripts/audit-verify.js
# Verify a single paper
node scripts/audit-verify.js --paper qje-investor-memory
# Output: reference-papers/audit/verification-results.jsonA script (scripts/audit-report.ts) that consumes the
static and verification JSON outputs and produces
reference-papers/REPLICATION-AUDIT.md.
Format:
# Replication Audit — YYYY-MM-DD
## Headline
Models executable: X / Y (Z%) across N R papers
Models verified: A / B (C%) across M staged papers
Papers 100% executable: P / N
## Feature Priority
| Feature | Models unblocked | Papers affected |
|-------------------|:----------------:|:---------------:|
| ... | ... | ... |
## Per-Paper Scorecards
### paper-id (K models)
- Executable: X/K (Z%)
- Verified: A/B (C%) [or "not staged"]
- Blockers:
- feature_name -> blocks N modelsnpm run audit -- --diff compares against the previous
REPLICATION-AUDIT.md (or a saved
static-results.json snapshot) and outputs only what
changed:
## Changes since last audit (YYYY-MM-DD)
- Added: distinct() support
- Models executable: 3,412 -> 3,455 (+43)
- Papers at 100%: 14 -> 16 (+2)
- Newly unblocked papers: jpe-dissecting-financial-crises, restud-inference-single-treated-clusterPrevious snapshots are saved in
reference-papers/audit/history/ with timestamps.
# Full audit: static + verify + report
npm run audit
# Static only (no data needed)
npm run audit:static
# Verify only (staged papers)
npm run audit:verify
# Report with diff
npm run audit -- --diffWired up via package.json scripts.
scripts/
audit-static.ts # Static DAG analysis CLI
audit-verify.ts # Execution verification CLI
audit-report.ts # Markdown report generator
reference-papers/
audit/
static-results.json # Latest static analysis output
verification-results.json # Latest verification output
r-ground-truth/ # R output per staged paper
qje-investor-memory.json
...
ts-results/ # TS output per staged paper
qje-investor-memory.json
...
data/ # Staged data files per paper
qje-investor-memory/
data.csv
...
staged-papers.json # Manifest of staged papers
history/ # Snapshots for diff mode
2026-04-14.json
...
REPLICATION-AUDIT.md # Human-readable report (committed)