*In response to Ankit Gupta , Nana Addison
Ankit Gupta is right that AI for biology fails at interfaces. His essay on AI in Biology is thoroughly thought through and I am very glad that people at places like ycombinator are thinking on this. His diagnosis each stage of drug discovery outputs a probabilistic hypothesis is realistic. The question he does not touch is why. The answer to this changes what to build for the future: both scientific knowledge and the end product. In this essay I will lay my thoughts on the mix of science and product perspective. I will also only focus on InSilico and modelling and simulation field- where one is using traditional black box ML models and other- systems theories to explain mechanisms.
The pipeline is a fiction
Consider a finite-state machine observed without its transition function. Inputs arrive, outputs emerge, and the mapping needs tons of domain knowledge. This is why ML/AI in early discovery appears fuzzy. It is not. The structure is hidden, not absent. But the optimization are treated like hand offs, not details.
Drug discovery modeled as a pipeline has the same problem as described by Ankit:
target = target_discovery(disease) drug = drug_design(target) outcome = clinical_trial(drug)
Each function is assumed to take a clean input and return a clean output. But the stages are not independent functions. They are nodes in a biological network, and the network has feedback. A target can look validated until a molecule reaches the wrong tissue — because tissue-specific expression, competing sinks, and homeostatic compensation are not pipeline stages.
This is not a biological problem. It is a modeling choice problem. And modeling choices have product consequences.
A product built on the pipeline model optimises one stage in isolation. A product built on the network model asks: what is the minimal mechanistic representation that keeps assumptions legible across all stages? Those are different products. Only one of them fails less. The same goes on the data aspects for the above. The biggest frustration is clean data with less noise- but again on the mechanism side- models should be deployed early in with less data and used with less frustration but more hypothesis generation.
Proposition: The fuzzy API is not a property of biology. It is a property of a pipeline abstraction applied to a system with feedback. The transition function — feedback topology, rate laws, compartment structure — is computable. This can be extracted in the form of high-throughput Pharmacokinetics (HTPK) early on and be extended into AOPs (Adverse outcome pathways).
If one goes into the Systems framework, then one definitely notices that biological networks are not randomly complex. They recur in a small set of circuit motifs — negative feedback, feed-forward loops, bistable switches — each with a defined, computable function. The insulin-glucose circuit is a negative feedback loop with a measurable setpoint. The HPA stress axis is a two-gland oscillator with a predictable period. Type 2 diabetes is bistable state collapse, not random failure. In each case the apparent fuzziness is the result of not writing down the circuit. Write it down and the behavior is deterministic.
The product implication is direct: a platform that holds biological assumptions explicitly between stages — as a queryable model rather than a tacit handoff — reduces attrition not by being more accurate, but by failing earlier and more cheaply. An assumption caught in the model is worth orders of magnitude more than the same assumption caught in Phase 2.
The animal model is the wrong node
The strongest empirical argument for the architecture claim comes from a paper published this month. Li et al. (Drug Discovery Today, July 2026) integrated human liver spheroid assays with DeepDILI — a deep learning model trained on human clinical outcomes — to predict drug-induced liver injury across 521 compounds.
The numbers are concerning:
- Animal models (rodent) → human DILI concordance: 44%
- Animal models (non-rodent) → human concordance: 40%
- DeepDILI alone (trained on human data): 58%
- Human liver spheroids alone: 47–89% across six independent datasets
- Confidence-Guided Integration (spheroid + DeepDILI): 73–85%
- Concordant subset — where both methods agree: 86–100%
Animal models are worse than a coin flip for non-rodent DILI prediction. This is not a calibration problem. It is an architecture problem of the scientific pharma industry: rodent hepatocyte biology does not map reliably onto human hepatocyte biology. The system being measured is structurally distant from the system that matters.
The CGI result makes the network argument concrete. When two orthogonal, human-relevant systems converge — one measuring cellular response directly, one inferring toxicity from human-outcome-trained structure — accuracy reaches 86–100%. The interface stops being fuzzy when independent mechanistic signals agree on the same node. That is not a statistical effect. It is what happens when you replace a wrong-species proxy with measurements taken at the right place in the network.
What remains unresolved is also instructive. Idiosyncratic DILI — immune-mediated, patient-specific — is not captured by spheroid assays lacking non-parenchymal cells. Binary classification misses dose-dependence and severity. The authors call for adverse outcome pathway frameworks, mechanistic omics integration, and pathway-based modeling. These are precisely the missing pieces of the transition function. Each named gap is computable.
NAMs are a re-architecture, not an upgrade
New Approach Methodologies — organoids, organ-on-chip, in silico predictive models — are often framed as better animal tests. They are not. They are measurements taken in a structurally different system: one whose node in the network is closer to the human target.
The FDA's 2025 road map for alternative methods and the Modernization Acts 2.0 and 3.0 point in the same direction. Replace animal testing not only with more data, but with human-relevant systems. The sim-to-real gap is a function of structural distance from human physiology, not experimental sophistication. A simple assay in human cells outperforms a complex assay in the wrong species.
Proposition: The predictive value of a model system is determined by its structural distance from human physiology. The gap is architectural.
Nana Addison argues that plant biochemistry — botanical bioactive discovery across commercial verticals from cosmetics to agricultural biotech — is structurally underbuilt for this kind of AI-NAM integration, and she is right. The plant circadian clock is the same negative-feedback oscillator science identifies in bacteria and human hormonal circuits. The glucosinolate-myrosinase cascade in Moringa is deterministic enzyme kinetics. The feedback loop for validation is six months, not twelve years. The architecture argument applies here with less friction — and has received a fraction of the investment. Most of it is pushed by NIH and JHU. There are pockets of scientific communities working on read-across in discordance because they are busy debating on naming definition. There is one thing for sure, unless companies like P&G, Unilever, Lo'real etc. step into the GenAI game- this aspect would still stay premature.
What AI can and cannot do
A learning machine finds regularities in data. It will find them whether or not they are causal. In a system with feedback, many strong correlations are confounded: the observable co-varies with the outcome through a shared upstream cause that disappears under perturbation. A model trained on such correlations performs well retrospectively and fails prospectively — precisely when it matters.
The modelling and simulation field has converged on the correct architecture ideas: mechanistic models define the space of biologically plausible hypotheses; AI methods — surrogate models, virtual patient generators, LLM-assisted parameter extraction — explore that space efficiently. The digital twin is the closed-loop product: a mechanistic model continuously updated by patient-specific data. It transfers to new interventions and new populations because the transfer function is the biology, not the training distribution.
Proposition: AI without causal structure is a fast correlator. In a network with feedback and compensation, fast correlation generalises poorly. The mechanistic model converts a correlator into a reasoner. This is not a feature. It is a prerequisite.
If you model drug discovery as a pipeline — target → drug → trial — then yes, every handoff is lossy. The upstream context doesn't fit through the interface. The assumptions get hidden.
But that's a property of pipelines, not of biology. Biology is a network. And networks have a feature that pipelines don't: their complexity isn't random.