Loads your uploaded MIMIC-III demo CSVs, assembles a per-admission record, routes each free-text admission diagnosis to the matching early-death (mortality-endpoint) oracle, computes the maximum achievable risk reduction and life-years saved, and compares against the patient's observed in-hospital outcome. Runs entirely in your browser.
⚠ Runs on the 100-patient MIMIC-III DEMO · processing is local (no upload/network) · representative oracle effect sizes“The fox knows many things, but the hedgehog knows one big thing.” — Archilochus, fr. 201
Load the data and press Run — this headline fills in from the records you load: the harness routes every record to the oracle that owns its mode of early death and reports the mean life-years the Bayesian Pareto-optimum set recovers over the disease-specific standard-of-care baseline.
Discussion. Read as a study, this is a transportability-and-ceiling exercise, not a clinical estimate.
The mean Δ is an upper bound on the incremental gain: the Pareto hazard ratio is applied multiplicatively to the
usual-care hazard, so interventions already embedded in standard care (e.g. a statin) are credited a second time. The
overlap-free version — the prescribed-vs-Pareto headroom in §06 — requires the PRESCRIPTIONS table;
without it that split reads n/a. Three further bounds on interpretation: (i) the chronic-prevention hazard ratios are
transported from ambulatory trials to post-ICU survivors; (ii) the acute first-year mortality m₁ and post-acute hazard
h_long are representative literature values, not fit to this cohort; and (iii) where observed actual life-years are
absent (a selected-decedent demo with the in-hospital-death flag suppressed), the baseline reflects the disease group's
standard-of-care expectation, not the loaded sample. Stated precisely: under transported, literature-calibrated
assumptions, the Pareto-optimum set recovers a modeled mean gain per person over standard of care — a
hypothesis-generating ceiling to validate against cohort-fit baselines and an overlap-free prescribed-vs-Pareto
comparison, not a prescribe-tomorrow figure. Not for clinical or policy use.
Select the CSV files from your archive.zip (at minimum PATIENTS.csv and
ADMISSIONS.csv; structured_medical_records.csv is optional and used only to read the
stated Age). Files are parsed locally in your browser — nothing is uploaded.
NHANES mode (free, no-application data). This harness also reads NHANES .XPT files directly
(SAS-transport, parsed in-browser). Drop a demographics file (DEMO_*.xpt, with SEQN/RIDAGEYR/RIAGENDR),
a prescriptions file (RXQ_RX_*.xpt, RXDDRUG), the medical-conditions questionnaire
(MCQ_*.xpt/DIQ/KIQ for routing), and a linked-mortality file (SEQN+MORTSTAT+PERMTH_EXM);
it auto-detects NHANES, routes by self-reported condition, and runs the same standard-of-care vs Pareto comparison on
an ambulatory population (general-population survival baseline). You download the files from CDC and drop them here —
the tool can't fetch wwwn.cdc.gov directly (cross-origin). NHANES III fixed-width files (e.g. adult.dat) also load:
drop the data file together with its SAS layout (adult.sas) — the tool parses the INPUT column positions and
LABELs, then routes by condition label. (Bring one fixed-width file + its .sas per load; prescriptions/mortality
can come from .XPT or CSV. Continuous cycles (1999+) are all-.XPT and need no layout.)
Harmonized NHANES 1988–2018 (figshare/Kaggle, Nguyen et al.) — drop the raw modules directly. Download the
cleaned demographics, questionnaire (or response), mortality and medications module CSVs and drop all of them here at
once. The tool now streams each file and keeps only the handful of columns it needs (SEQN, age, sex,
self-reported conditions, MORTSTAT/PERMTH, drug names), so the 1000+-column questionnaire file loads
without overwhelming the browser; it then merges them on SEQN and runs the analysis. If the medications module
stores drug codes, also drop dictionary_drug_codes.csv. No Python, no fetching — still fully local.
Stage is not in NHANES (shown as NA-stage in §04b). If a file's columns aren't recognised, the load status lists
exactly what each file was read as, so a name mismatch is visible rather than silent.
What this archive does and does not contain. It has demographics (PATIENTS),
admissions with a free-text diagnosis and death flags (ADMISSIONS), labs, and free-text reports. It does
not contain PRESCRIPTIONS, DIAGNOSES_ICD, or PROCEDURES_ICD. Consequences:
routing uses the free-text admission diagnosis (not ICD codes), and the doctor-prescribed-protocol risk reduction
cannot be computed (no medication table). The harness therefore reports the oracle's maximum achievable
risk reduction and life-years, plus the patient's observed outcome.
Every atlas oracle whose primary endpoint is mortality / early death is included here. Population-count analyses (us-mortality, self-caused-harm, rare-disease) are excluded. Symptom-scale and biomarker endpoints (osteoarthritis, depression, anxiety, ADHD, schizophrenia, PTSD, OCD, bipolar, LDL cholesterol) have no patient-level early-death endpoint and are handled separately in §08 as non-mortality reductions, not life-years.
| Oracle | Endpoint | Interventions | Routed from diagnoses containing… |
|---|
| Subject/Adm | Age/Sex | Admission diagnosis | Oracle | Prescribed RR | Max achievable RR | Usual-care baseline LY | Pareto-optimum LY | Actual LY (obs.) | Δ years added | Observed |
|---|
| Oracle | n | Mean max-achievable RR | Mean life-yrs added/adm | Total life-yrs added | Observed in-hosp deaths |
|---|
The same risk-reduction and life-years figures, summed by oracle and — where the survey records it — disease
variant (e.g. a diabetes sub-item, or cancer site in the cycles that ask it). NHANES does not capture clinical
stage (tumour stage, NYHA class, CKD stage, Child-Pugh), so the stage column reads NA-stage: a deliberate
anti-fabrication placeholder, not a missing computation. This section populates in NHANES mode when the demographics file
carries variant_* columns (the companion preprocessor emits them).
| Oracle | Variant | Stage | n | Mean max-achievable RR | Mean life-yrs added | Total life-yrs added |
|---|
The usual-care baseline is the standard of care: the empirical survival of patients with this disease who
received ordinary treatment, including medications taken in the recent past (e.g. a statin they were already on). It is
not an untreated counterfactual — there is no plausible untreated cohort to estimate it from. The Bayesian
Pareto optimum is a different, specified set of interventions. So this table compares two regimens — standard of
care vs the Pareto-optimum set — and the years added is Pareto-optimum LE − usual-care baseline LE.
A high disease-specific acute first-year mortality is carried by both regimens; the Pareto set acts on the modifiable
post-acute hazard. Observed actual life-years (from dod − admittime) are shown as the empirical anchor.
| Oracle | n | Avg age @ intervention | Avg age @ death | Avg actual LY (obs.) | Usual-care baseline LY | Pareto-optimum LY | Avg Δ years added |
|---|
What the delta is — and the one caveat that remains. This compares the standard-of-care regimen to the
Pareto-optimum set. The Pareto effect is applied multiplicatively to the usual-care hazard, so where the two regimens
overlap — e.g. both include a statin — the model still credits that shared intervention, making the headline Δ an
upper bound on the incremental gain. The clean, overlap-free version is in §06: the prescribed-vs-Pareto
headroom measures the Pareto optimum relative to what the patient was actually given (from
PRESCRIPTIONS), so it nets out the standard care already in the baseline. Two further notes: the Pareto set
acts only on the post-acute hazard (a statin does not avert acute septic death), and observed actual LY runs below the
usual-care baseline here because this demo cohort is selected decedents — the baseline reflects the disease group's
realistic standard-of-care expectation, not this biased sample.
The demo archive you may have loaded omits PRESCRIPTIONS. Load it (it ships with the full
credentialed MIMIC-III, ~4.16M rows, and with the open 100-patient demo on PhysioNet) and this section activates: each
admission's ordered drugs are string-matched (DRUG/DRUG_NAME_GENERIC) to the routed oracle's
interventions, giving the doctor-prescribed risk reduction, the gap to the Pareto optimum, and a split of
life-years into already secured by the prescribed protocol vs remaining headroom.
| Oracle | n (with Rx) | Mean prescribed RR | Mean Pareto RR | Mean gap (unrealized) | Mean yrs secured | Mean headroom yrs |
|---|
Interpretation caveats. MIMIC PRESCRIPTIONS are inpatient CPOE orders during the
stay — a mix of acute ICU drugs (pressors, sedatives, antibiotics, which map to no prevention oracle) and continued
chronic medications (statins, antihypertensives, etc., which do). So the prescribed RR here is a lower bound on the
true outpatient regimen and is not adherence-weighted (a single inpatient order ≠ chronic use). Lifestyle and
procedural interventions (exercise, diet, weight loss, rehab) never appear in a drug table, so part of the "gap to Pareto"
is structurally unmeasurable from prescriptions alone.
diagnosis → oracle by keyword, in priority order: lymphoma → cancer → transplant → sepsis/infection → brain → heart → liver → kidney → pulmonary → metabolic → all-cause (default). Co-occurring conditions route to the highest-priority match — e.g. S/P LIVER TRANSPLANT → transplant, SEPSIS;PNEUMONIA → all-cause (sepsis dominates). This priority is a deliberate, editable choice; free-text routing has irreducible ambiguity and a real run would validate it on a labelled sample.PRESCRIPTIONS table is loaded — drugs are string-matched (DRUG/DRUG_NAME_GENERIC, the standard MIMIC approach; production uses NDC→RxNorm→ATC) to the routed oracle's interventions. Without that table it shows n/a. See §06 for the prescribed-vs-Pareto split.HR=exp(Σln(HRᵢ)·(1−ρ)), ρ=0.30. This is a counterfactual ceiling, not a prescribe-tomorrow figure.m₁ (e.g. liver ≈0.50, cancer ≈0.55, sepsis/all-cause ≈0.40, heart ≈0.28) carried by both regimens. Phase 2 (post-acute) applies a chronic disease hazard h_long plus the age/sex background; the Pareto set's joint HR multiplies only h_long. Remaining life-expectancy is the area under each survival curve; years added = Pareto-optimum LE − usual-care baseline LE.m₁/h_long are representative literature values, not fit to this cohort; qx background is a Canadian general-population table.m₁/h_long are representative literature values, not fit to this cohort — a production version would calibrate them to disease-group survival (e.g. registry or full-MIMIC follow-up).hospital_expire_flag / PATIENTS dod — the only ground-truth mortality available; shown for context against the counterfactual.