Blog/From the team
The Art of the State: How Medicine Lost Track of What's Actually Happening to Patients
The map and the glimpse
Imagine you are about to drive across the country. Before you leave, someone hands you a paper map, gives you thirty seconds to look at it, and takes it away. You will not see it again for a thousand miles. At that point you get another thirty-second look, and then it disappears again. It does not show your current location, traffic, road closures, or conditions. And it has no memory of where you have been or what route you were attempting.
A GPS is different in every dimension that matters. It knows where you are and where you have been. It tracks your trajectory continuously. And it remembers the route you are on, so it can tell you whether your last turn actually got you closer to where you need to go.
Healthcare has the map-and-a-glimpse. It does not have GPS. And that underlying gap is what we in the "biz" call a state representation problem.
State is how machines see the world. Think of it as a description of a system at a given point in time such that you can make a (hopefully good) decision. A thermostat's state is the current temperature. A shipping logistics company's state is every plane, train, and truck, every package's weight, speed, and location all updating in real time, all linked together. State is also how machines process the world in order to present information to humans. A GPS doesn't just know where you are for its own sake. It knows where you are so it can tell you where to turn. Without a structured representation of the system, there is nothing for the algorithm to reason over, nothing to present, nothing to act on. Hand a picture of a map to a routing engine and it can't do anything with it. The representation is the capability.
The complexity of state varies enormously, but the quality of any state representation comes down to three independent properties: the vocabulary of the model (what entities and distinctions the system can express), the sampling frequency (how often is the state updated), and the relational structure between elements (whether the system captures how its parts causally affect each other). Each can fail on its own. A logistics system with a rich vocabulary that updates once a week is nearly useless. A sensor that streams data every second but captures no relationships between readings is just noise. All three have to work for the state to mean anything.
Vocabulary
If you were to look at a patient's "state" from a modern electronic health record, the core entities you would see are lists of diagnosis and procedure codes. These codes were designed to answer the question "what did we do and why should we get paid for it." They were never designed to answer "what is happening to this patient and what we should do next." The gap between those two questions is where most of the information loss in medicine lives.
Consider lung cancer. A patient with small cell lung cancer and a patient with non-small cell lung cancer have radically different diseases: different prognoses, different treatment pathways, different molecular profiles, different expected trajectories. Small cell lung cancer is among the most aggressive malignancies in medicine. Non-small cell lung cancer accounts for 85% of cases and encompasses subtypes (adenocarcinoma, squamous cell carcinoma, large cell carcinoma) with fundamentally different behavior.
In ICD-10, both patients receive the same code.
Patient A (Small Cell Lung Cancer): C34.11 — Malignant neoplasm of upper lobe, right bronchus or lung
Patient B (Non-Small Cell, Stage IIA Adenocarcinoma): C34.11 — Malignant neoplasm of upper lobe, right bronchus or lung
The structured data contains no histological subtype, no tumor staging, no molecular markers (EGFR, ALK, PD-L1 status), no tumor grade, and no localization beyond lobe and laterality. Two patients with fundamentally different diseases are structurally indistinguishable.
This is not an oncology-specific problem. In cardiology, I50.9 covers "heart failure, unspecified," collapsing HFrEF, HFpEF, and HFmrEF into a single label: three conditions with different etiologies, different pharmacological responses, and different monitoring requirements. In endocrinology, E11.9 is "Type 2 diabetes mellitus without complications." A patient with an A1c of 6.8 on metformin alone and a patient with an A1c of 11.2 on triple therapy with early nephropathy are coded identically.
A useful patient state model requires vocabularies that are extensible and domain-specific: ontologies that can express clinical state at whatever resolution the clinical context demands, and that grow as understanding of a disease deepens. A heart failure ontology that distinguishes ejection fraction ranges, NYHA functional class, and diuretic response patterns. A diabetes ontology that captures glycemic trajectories, medication regimen detail, and complication staging. These are the minimum resolution at which clinical decisions can be made with any confidence.
Sampling frequency
Patients interact with health systems at an incredibly low sampling rate. The average American adult sees a primary care physician once (median) a year. Out of 8,760 hours in a year, the system's total observation window is roughly 15 minutes. For a Type 2 diabetic, the entire understanding of their disease trajectory is derived from four to eight data points annually.
The common response to this "frequency gap" is the rise of consumer wearables and remote monitoring. We are now drowning in high-frequency data—steps, heart rate, sleep stages, and continuous glucose readings. But frequency without clinical context is just high-speed noise. A stream of 1440 heart rate readings per day tells you nothing if the system doesn't know if the patient was running a marathon, having a panic attack, or experiencing a drug-induced arrhythmia. Without the "why"—the intent and the activity surrounding the data—increased frequency just shifts the burden of interpretation onto an already overtaxed clinician.
A useful patient state model requires high-frequency observation coupled with clinical context. This is the constraint that makes the problem impossible to solve from the outside. If you are a technology company trying to build a state model by ingesting legacy EHR data or raw wearable streams, you are limited by the low sampling rate of the clinic and the lack of context in the sensor.
To increase the sampling rate of meaningful data, you have to change how care is delivered. You must move the observation into the patient's life, but you must bring the clinical reasoning along with it.
Relational structure
An ICD code array is a flat list. It tells you that a patient has been assigned labels. It does not tell you how those labels relate to each other, which condition caused which symptom, which medication was prescribed to treat which diagnosis, or whether a lab abnormality is the result of a disease or a side effect of treatment.
The common assumption is that this causal structure lives in the free text of clinical notes. It largely does not.
What we hope a progress note says:
Patient's worsening fatigue is likely attributable to undertreated hypothyroidism (TSH trending up over 6 months from 4.2 to 8.7), compounded by poor sleep quality secondary to uncontrolled GERD. Increasing levothyroxine from 75mcg to 88mcg and adding omeprazole 20mg QHS.
What the note typically says:
Pt presents for f/u. Fatigued. TSH 8.7. GERD symptoms. Increase levo to 88. Start omeprazole. RTC 3 months.
Causal links present: zero. Why was levo increased? Implied but unstated. Why omeprazole? Unclear if for GERD or gastroprotection. Relationship between fatigue, TSH, and sleep? Absent.
Free text clinical notes are written under severe time pressure, often templated with large sections auto-populated from prior visits. They frequently omit causal reasoning entirely, listing problems, observations, and actions without connecting them. They contain copy-forward artifacts where information from six visits ago persists unmodified. Large language models can extract entities from these notes reasonably well. They cannot reconstruct causal reasoning that was never recorded.
A useful patient state model requires causal relationships between observations, actions, and conditions to be first-class elements of the data structure, captured at the moment a clinical decision is made. This is what makes a state model fundamentally different from a medical record. A medical record is a chronological log of events. A state model is a representation of the relationships between those events. The record tells you what happened. The model tells you why.
The compound failure
These three problems do not add. They multiply. Low-resolution vocabulary means the system cannot distinguish between clinically different patients. Low-frequency observation means even the labels it does capture are exceptionally noisy. And the absence of causal structure means the labels have no linkage. No way to express that this observation is related to that treatment, or that this symptom is downstream of that condition.
The result is that the patient "state" inside a modern EHR is not really state at all. It is a record of transactions with the healthcare system that tells you what billing events occurred, but almost nothing about the clinical reality of the person those events are attached to. This is not because the people who built these systems were careless. ICD was designed for reimbursement. EHRs were designed around documentation requirements. The data faithfully serves the purposes it was built for. It just was not built to represent patient state.
Building the GPS
Fixing this requires moving along all three axes simultaneously. Higher-resolution vocabularies alone do not help if the observation frequency remains quarterly. Higher frequency alone does not help if you are sampling the same low-resolution codes. And neither helps if there is no causal structure connecting the observations to each other.
Most health-tech companies treat patient state as a data engineering problem: aggregate enough EHR data, clean it up, build models on top. This misidentifies the bottleneck. The data itself is the constraint. The care delivery layer generates the data. If you want data at a resolution, frequency, and causal density that the existing care delivery layer does not produce, you have to build a new care delivery layer that does. The patient state model and the care delivery model are the same problem — but only if the incentive driving that care delivery layer is clinical quality. When the organizing incentive is reimbursement, you get data structures optimized for reimbursement. When it is delivering the highest quality care, the data structures follow.
That is one of the core problems we are solving at Actually Health. We have built the first clinical decision architecture designed with high-resolution, continuous patient state as the primary data structure, where care delivery and clinical reasoning are unified in a single compounding system. If you want to follow along or get involved, follow us on LinkedIn or reach out at hello@actually.health.