MIMIC-III Mortality Harness — All Early-Death Oracles

00The modelled ceiling, in one number

“The fox knows many things, but the hedgehog knows one big thing.” — Archilochus, fr. 201

Load the data and press Run — this headline fills in from the records you load: the harness routes every record to the oracle that owns its mode of early death and reports the mean life-years the Bayesian Pareto-optimum set recovers over the disease-specific standard-of-care baseline.

Discussion. Read as a study, this is a transportability-and-ceiling exercise, not a clinical estimate. The mean Δ is an upper bound on the incremental gain: the Pareto hazard ratio is applied multiplicatively to the usual-care hazard, so interventions already embedded in standard care (e.g. a statin) are credited a second time. The overlap-free version — the prescribed-vs-Pareto headroom in §06 — requires the PRESCRIPTIONS table; without it that split reads n/a. Three further bounds on interpretation: (i) the chronic-prevention hazard ratios are transported from ambulatory trials to post-ICU survivors; (ii) the acute first-year mortality m₁ and post-acute hazard h_long are representative literature values, not fit to this cohort; and (iii) where observed actual life-years are absent (a selected-decedent demo with the in-hospital-death flag suppressed), the baseline reflects the disease group's standard-of-care expectation, not the loaded sample. Stated precisely: under transported, literature-calibrated assumptions, the Pareto-optimum set recovers a modeled mean gain per person over standard of care — a hypothesis-generating ceiling to validate against cohort-fit baselines and an overlap-free prescribed-vs-Pareto comparison, not a prescribe-tomorrow figure. Not for clinical or policy use.

01Load the data

Select the CSV files from your archive.zip (at minimum PATIENTS.csv and ADMISSIONS.csv; structured_medical_records.csv is optional and used only to read the stated Age). Files are parsed locally in your browser — nothing is uploaded.

NHANES mode (free, no-application data). This harness also reads NHANES .XPT files directly (SAS-transport, parsed in-browser). Drop a demographics file (DEMO_*.xpt, with SEQN/RIDAGEYR/RIAGENDR), a prescriptions file (RXQ_RX_*.xpt, RXDDRUG), the medical-conditions questionnaire (MCQ_*.xpt/DIQ/KIQ for routing), and a linked-mortality file (SEQN+MORTSTAT+PERMTH_EXM); it auto-detects NHANES, routes by self-reported condition, and runs the same standard-of-care vs Pareto comparison on an ambulatory population (general-population survival baseline). You download the files from CDC and drop them here — the tool can't fetch wwwn.cdc.gov directly (cross-origin). NHANES III fixed-width files (e.g. adult.dat) also load: drop the data file together with its SAS layout (adult.sas) — the tool parses the INPUT column positions and LABELs, then routes by condition label. (Bring one fixed-width file + its .sas per load; prescriptions/mortality can come from .XPT or CSV. Continuous cycles (1999+) are all-.XPT and need no layout.)

Harmonized NHANES 1988–2018 (figshare/Kaggle, Nguyen et al.) — drop the raw modules directly. Download the cleaned demographics, questionnaire (or response), mortality and medications module CSVs and drop all of them here at once. The tool now streams each file and keeps only the handful of columns it needs (SEQN, age, sex, self-reported conditions, MORTSTAT/PERMTH, drug names), so the 1000+-column questionnaire file loads without overwhelming the browser; it then merges them on SEQN and runs the analysis. If the medications module stores drug codes, also drop dictionary_drug_codes.csv. No Python, no fetching — still fully local. Drop the response module too and the harness computes a clinical stage from the labs (kidney/liver/metabolic/pulmonary) for §04b; otherwise stage shows NA-stage. If a file's columns aren't recognised, the load status lists exactly what each file was read as, so a name mismatch is visible rather than silent.

⬆ Drop CSV files here, or browse

No files loaded yet. Accepts: harmonized NHANES module .csv (demographics+questionnaire+mortality+medications, drop together) · MIMIC .csv · NHANES .xpt · NHANES III .dat + .sas.

Pareto target

What this archive does and does not contain. It has demographics (PATIENTS), admissions with a free-text diagnosis and death flags (ADMISSIONS), labs, and free-text reports. It does not contain PRESCRIPTIONS, DIAGNOSES_ICD, or PROCEDURES_ICD. Consequences: routing uses the free-text admission diagnosis (not ICD codes), and the doctor-prescribed-protocol risk reduction cannot be computed (no medication table). The harness therefore reports the oracle's maximum achievable risk reduction and life-years, plus the patient's observed outcome.

02Early-death oracles included

Every atlas oracle whose primary endpoint is mortality / early death is included here. Population-count analyses (us-mortality, self-caused-harm, rare-disease) are excluded. Symptom-scale and biomarker endpoints (osteoarthritis, depression, anxiety, ADHD, schizophrenia, PTSD, OCD, bipolar, LDL cholesterol) have no patient-level early-death endpoint and are handled separately in §08 as non-mortality reductions, not life-years.

Oracle	Endpoint	Interventions	Routed from diagnoses containing…

03Per-admission output (per-person rows withheld — enclave release rules)

Subject/Adm	Age/Sex	Admission diagnosis	Oracle	Prescribed RR	Max achievable RR	Usual-care baseline LY	Pareto-optimum LY	Actual LY (obs.)	Δ years added	Observed

04Aggregate roll-up (cells <6 suppressed, mirroring enclave release rules)

Oracle	n	Mean max-achievable RR	Mean life-yrs added/adm	Total life-yrs added	Observed in-hosp deaths

04bRoll-up by family × disease × subtype × stage (stage computed from response-module labs where a validated algorithm exists; else NA-stage)

The same statistics as the per-family table above, resolved to a finer grain. Each family (oracle) row carries its family-total record count and statistics; beneath it, every disease within the family — read from the actual NHANES condition flags (e.g. cardiac splits into congestive heart failure, coronary heart disease, angina and heart attack; pulmonary into COPD, emphysema and chronic bronchitis; liver into its reported items) — is broken out, then any recorded subtype (diabetes sub-item or cancer site, where the cycle asks it) and stage. Where a person reports several conditions in one family they are attributed to the highest-priority flag, so per-disease counts sum exactly to the family total. NHANES does not report a clinical stage, so this build computes one from the response-module labs for the four oracles with a validated cross-sectional algorithm: kidney → KDIGO G-stage (CKD-EPI 2021 eGFR), liver → FIB-4 fibrosis band, metabolic → ADA glycaemic stage, pulmonary → GOLD (where spirometry exists). Oracles with no NHANES staging analog (cancer/heart/brain — no TNM, NYHA, or Child-Pugh) remain NA-stage. Computed stages are single-visit categorisations (chronicity unconfirmable), not chronicity-confirmed diagnoses; an explicit variant_stage column still overrides them. Cells with fewer than 6 records are suppressed, mirroring enclave release rules.

Reveal computed stages for cells with n<6 (relaxes the enclave suppression on the stage breakdown rows only; family-total stats stay suppressed)

Family	Disease	Subtype	Stage	n	Mean max-achievable RR	Mean life-yrs added/rec	Total life-yrs added	Observed deaths	Mean age	Mean actual LY (decedents)	Mean usual-care LE	Mean Pareto LE	Mean Δ LE	SD Δ LE

04cLab-derived disease staging (whole loaded cohort — independent of self-reported diagnosis)

Every loaded participant who has the required response-module labs is staged across all 18 validated algorithms — not gated by self-report or by routing to a mortality oracle, and not subject to the §04b small-cell suppression. This is the population-level staging the lab data actually supports: a person with eGFR 40 is CKD G3b whether or not they reported kidney disease. Requires demographics + response module CSVs. Stages are single-visit categorisations (chronicity unconfirmable). If the table is empty, the diagnostic line below states why.

Run the harness with the response module loaded to populate.

Disease / algorithm	Stage	n	% of staged	distribution
Not yet run.

05Actual vs usual-care baseline vs Pareto-optimum life-years ("years added")

The usual-care baseline is the standard of care: the empirical survival of patients with this disease who received ordinary treatment, including medications taken in the recent past (e.g. a statin they were already on). It is not an untreated counterfactual — there is no plausible untreated cohort to estimate it from. The Bayesian Pareto optimum is a different, specified set of interventions. So this table compares two regimens — standard of care vs the Pareto-optimum set — and the years added is Pareto-optimum LE − usual-care baseline LE. A high disease-specific acute first-year mortality is carried by both regimens; the Pareto set acts on the modifiable post-acute hazard. Observed actual life-years (from dod − admittime) are shown as the empirical anchor.

Oracle	n	Avg age @ intervention	Avg age @ death	Avg actual LY (obs.)	Usual-care baseline LY	Pareto-optimum LY	Avg Δ years added

What the delta is — and the one caveat that remains. This compares the standard-of-care regimen to the Pareto-optimum set. The Pareto effect is applied multiplicatively to the usual-care hazard, so where the two regimens overlap — e.g. both include a statin — the model still credits that shared intervention, making the headline Δ an upper bound on the incremental gain. The clean, overlap-free version is in §06: the prescribed-vs-Pareto headroom measures the Pareto optimum relative to what the patient was actually given (from PRESCRIPTIONS), so it nets out the standard care already in the baseline. Two further notes: the Pareto set acts only on the post-acute hazard (a statin does not avert acute septic death), and observed actual LY runs below the usual-care baseline here because this demo cohort is selected decedents — the baseline reflects the disease group's realistic standard-of-care expectation, not this biased sample.

06Doctor-prescribed vs Pareto-optimum (requires the PRESCRIPTIONS table)

The demo archive you may have loaded omits PRESCRIPTIONS. Load it (it ships with the full credentialed MIMIC-III, ~4.16M rows, and with the open 100-patient demo on PhysioNet) and this section activates: each admission's ordered drugs are string-matched (DRUG/DRUG_NAME_GENERIC) to the routed oracle's interventions, giving the doctor-prescribed risk reduction, the gap to the Pareto optimum, and a split of life-years into already secured by the prescribed protocol vs remaining headroom.

Oracle	n (with Rx)	Mean prescribed RR	Mean Pareto RR	Mean gap (unrealized)	Mean yrs secured	Mean headroom yrs

Interpretation caveats. MIMIC PRESCRIPTIONS are inpatient CPOE orders during the stay — a mix of acute ICU drugs (pressors, sedatives, antibiotics, which map to no prevention oracle) and continued chronic medications (statins, antihypertensives, etc., which do). So the prescribed RR here is a lower bound on the true outpatient regimen and is not adherence-weighted (a single inpatient order ≠ chronic use). Lifestyle and procedural interventions (exercise, diet, weight loss, rehab) never appear in a drug table, so part of the "gap to Pareto" is structurally unmeasurable from prescriptions alone.

07Methods & caveats

Routing & priority: free-text admission diagnosis → oracle by keyword, in priority order: lymphoma → cancer → transplant → sepsis/infection → brain → heart → liver → kidney → pulmonary → metabolic → all-cause (default). Co-occurring conditions route to the highest-priority match — e.g. S/P LIVER TRANSPLANT → transplant, SEPSIS;PNEUMONIA → all-cause (sepsis dominates). This priority is a deliberate, editable choice; free-text routing has irreducible ambiguity and a real run would validate it on a labelled sample.
Prescribed-protocol RR: computed when a PRESCRIPTIONS table is loaded — drugs are string-matched (DRUG/DRUG_NAME_GENERIC, the standard MIMIC approach; production uses NDC→RxNorm→ATC) to the routed oracle's interventions. Without that table it shows n/a. See §06 for the prescribed-vs-Pareto split.
Max-achievable RR: greedy Pareto over all the oracle's decreasing factors, using the atlas rho-corrected joint model HR=exp(Σln(HRᵢ)·(1−ρ)), ρ=0.30. This is a counterfactual ceiling, not a prescribe-tomorrow figure.
Life-years (usual-care baseline vs Pareto-optimum set): the usual-care baseline is the standard of care — disease-specific survival from literature on real, treated patients, so ordinary treatment (including recently-taken statins etc.) is already embedded; it is not an untreated counterfactual. Phase 1 (acute, year 1) applies a disease-specific 1-year mortality m₁ (e.g. liver ≈0.50, cancer ≈0.55, sepsis/all-cause ≈0.40, heart ≈0.28) carried by both regimens. Phase 2 (post-acute) applies a chronic disease hazard h_long plus the age/sex background; the Pareto set's joint HR multiplies only h_long. Remaining life-expectancy is the area under each survival curve; years added = Pareto-optimum LE − usual-care baseline LE.
Overlap caveat & the clean version: because the Pareto HR is applied to the usual-care hazard, interventions common to both regimens (e.g. a statin already in standard care) are credited again — so the §05 Δ is an upper bound on the incremental gain. §06's prescribed-vs-Pareto headroom nets this out by measuring the Pareto optimum relative to the patient's actual prescribed drugs. Baseline m₁/h_long are representative literature values, not fit to this cohort; qx background is a Canadian general-population table.
Residual life-years caveats: (a) the bundle acts only post-acute, so high-acute-mortality groups gain little; (b) the chronic-prevention HRs are transported from ambulatory trials to post-ICU survivors; (c) m₁/h_long are representative literature values, not fit to this cohort — a production version would calibrate them to disease-group survival (e.g. registry or full-MIMIC follow-up).
Observed outcome: from ADMISSIONS hospital_expire_flag / PATIENTS dod — the only ground-truth mortality available; shown for context against the counterfactual.
Effect sizes are a documented representative subset per oracle (sources listed in §2 on hover), re-expressed in portable JS; full sets live in the atlas dashboards.
Demo size & selection: 100 patients, all with recorded deaths (SSA Death Master File) — not a representative survival cohort. De-identified ages >89 are stored shifted; clamped to 89 here.
Not for clinical or policy use. Pipeline demonstration only.

08Alternative (non-mortality) endpoints (records with no mortality endpoint)

MIMIC-III Clinical Database (MIT Laboratory for Computational Physiology, Beth Israel Deaconess Medical Center), demo subset. Local processing; representative effect models. Not for clinical or policy use.