Two claims, one stage#
At the 2026 Isaac Asimov Memorial Debate, the American Museum of Natural History assembled six panelists to discuss AI. Neil deGrasse Tyson moderated.
Two claims from that stage deserve to sit next to each other.
Cindy Rush, a statistician at Columbia, offered this as a reason not to worry:
“At the end of the day, all of this is just a mathematical equation, right? When you ask ChatGPT a question, there’s a mathematical equation that maps from your question to the output, right? And, yeah, I’m a little bit cool in the sense that it’s still math at the bottom. So, it’s hard for me to kind of see this becoming something so dangerous and bad. At least in the short term. Because there’s a lot of steps.”
Eric Schmidt, former CEO of Google, offered this as a solution:
“At the end, the way AI will be controlled is there will be AI that controls AI. And that’s the best way to understand the outcome.”
If it is just math, you do not need AI to control it. You write deterministic rules. You set bounds. You prove properties. That is what you do with math you understand.
If you need AI to control AI, you are admitting the system is complex enough that only another instance of itself can manage it. That is not math you understand. That is a system you are managing probabilistically, with another probabilistic system, and hoping the error rates do not compound.
Both statements were offered as reassurance. They cannot both be true.
It is not math in the way that matters#
Rush is technically correct. There is a mathematical function that maps input tokens to output probabilities. Every operation is defined. The forward pass is deterministic given fixed weights and no sampling.
But “it’s just math” is meant to imply that the system is predictable, understood, and under control. It is not. The function has billions of parameters, and no one can explain why a particular input produces a particular output. As Rush herself acknowledged: “The challenge is even though we can characterize it mathematically, as humans, it’s hard to understand or interpret its reasoning in a way that makes sense to us.”
We can write down the equation, but we cannot interpret what it does. We understand the math. We do not understand the system the math produces. Those are not the same thing.
And the industry does not even use the deterministic version. Large language models inject randomness into their own output via a parameter called “temperature.” At temperature zero, the output degenerates: it loops, repeats itself, produces garbage for exactly the tasks that justify the hype. The system works better when you make it less precise. If you need randomness to get useful output from your equation, the equation is not doing what “it’s just math” implies.
The story writes itself#
Later in the debate, Soares described Claude editing tests to be easier to pass, a behavior nobody programmed. When Tyson asked Rush, “How does this happen if it’s just math all the way down?” the panel moved to agency and emergent behavior. The simpler answer was sitting right there.
The internal dialog of these systems sounds like thinking because the training data is full of humans reasoning out loud. Forum threads, email chains, Stack Overflow answers, Slack conversations. Billions of examples of humans talking through problems in natural language, showing their work, correcting themselves, building on what came before.
The model learned that pattern. Not the reasoning. The pattern.
When a “reasoning model” produces a chain of thought, it is continuing a conversation that looks like the ones in its training data. The story is easy to continue when the first part is already written down. Reinforcement learning shapes which chains get rewarded. That is real engineering. But it is optimization of the pattern, not the emergence of understanding.
The shape of the reasoning is learned. The substance is sampled.
Retransmission, not reasoning#
Schmidt’s response to the test-editing behavior was direct: “These are not intelligent systems, and you’re using human words to describe them.”
He is right. But the behavior still happened. And the industry’s response is always the same: add another layer. Another check. Another model evaluating the first model’s output.
The loop exists because the model is limited. It cannot account for all dimensions of a problem in a single pass. The retries are not the model thinking harder. They are evidence that it could not think hard enough the first time.
TCP does not “understand” your data. It sends packets, checks for errors, and resends the ones that failed. Nobody calls a retransmitted packet “the network thinking harder about delivery.” It is error correction. A failure that got caught.
But wrap the same pattern around an LLM and the industry calls it “chain of thought.” It is failure recovery rebranded as reasoning.
AI controlling AI#
This is where Schmidt’s claim collapses into itself.
Every layer of the modern AI stack is the same thing: a probabilistic check that improves the overall success rate. The base model generates candidates. A guardrail filters for safety. A validator checks format. A reward model scores quality. A human reviews the output. Each layer has its own failure rate. None are fundamentally different from the model they are checking.
We build reliable infrastructure from unreliable components plus error correction all the time. The difference is that with infrastructure, we are honest about it. We measure failure rates. We publish SLAs. We track P50, P90, P99. AI has none of this. The industry oscillates between “it’s just math” (implying perfect reliability) and “it’s intelligent” (implying it will figure things out). Both framings skip the fact that this is a probabilistic system with a measurable failure rate at every layer.
The base layer is a model optimized to produce probable tokens, weighted by what appeared most often in the training data. The output gravitates toward the average. The internet already has a word for this: slop. That is not an insult. It is the accurate description of a system optimized to produce the median of its training distribution. The filter stack exists because the raw output is not good enough to ship. “AI controlling AI” is slop machines checking each other’s work.
The cost of indeterminism#
Kate Crawford put a number on the infrastructure: $700 billion collectively spent on AI in 2026. Twenty Manhattan Projects, every year. By her estimate, AI systems are on track to consume 25% of the world’s energy by the end of the decade.
Every additional layer costs energy. Every retry, every reasoning loop, every filter is another inference call across thousands of GPUs. And the output is still indeterministic. More power, more compute, more water, more land, all to push the percentile from P85 to P90.
A system that cannot produce deterministic output should not be the decision-maker in life-critical paths. Schmidt, as chair of the Pentagon’s Defense Innovation Board, co-authored the guidelines on exactly this point: AI is not reliable enough for lethality decisions, and it is unlikely to be so for a very long time.
That conclusion does not stop at warfare. When a DNS query resolves, it resolves to the same address every time. When a circuit breaker trips, it trips at the same threshold every time. These systems work because their behavior is predictable, verifiable, and repeatable. AI meets none of those criteria. The industry is betting that a deep enough filter stack will compensate for a foundation that is, by design, indeterministic. That is not engineering. That is hope.
The headcount fallacy#
Callison-Burch posed a question to the audience: “Do I want my company to be more productive with the same staff, or do I want to be the same level with half the staff?” He urged the growth path.
The framing assumes “the same level” is acceptable. Companies are already cutting corners. Teams are already understaffed. Technical debt is already compounding. The people being cut are the ones who would otherwise be fixing those problems. And the workload does not disappear. It shifts upward. If junior engineers are replaced by AI-generated code, the senior engineers who remain inherit the review burden for every line of output. The volume increases. The liability stays with the humans who approved it, not the system that generated it.
The review problem also gets harder. A junior engineer writing bad code is usually obvious: wrong patterns, missing edge cases, things that visibly do not look right. AI-generated bad code follows conventions, passes linters, and reads clean. The failure is in the logic, not the syntax. A human reviewer catches this because they think across all dimensions of the problem at once. The model cannot. It produced the code one token at a time, optimizing for probability, not correctness. The reviewer is no longer looking for bad code. They are looking for plausible code that is subtly wrong, produced by a system that does not understand why it is wrong.
The humans who remain become the filter stack, with less capacity, more volume, and more liability than the team they replaced.
Pick one#
If AI is just math, govern it like math. Prove its properties. Certify its outputs. Hold it to the same standards as every other engineered system in a critical path.
If AI requires AI to control it, stop calling it math. Call it what it is: an unreliable component that requires extensive probabilistic error correction to produce usable output, consuming historic amounts of energy in the process.
The industry wants both stories. “It’s just math” when regulators come asking. “It’s intelligent” when investors come buying. “AI will control AI” when engineers come worrying. These framings are not compatible. One of them has to go.
Today, the model is limited. The loop proves it. Every retry, every filter, every human reviewer in the stack is there because the model cannot account for all the dimensions that matter. That is not a system you deploy into critical infrastructure. That is a system you build guardrails around and use where failure is tolerable.
And if we ever build a model that is not limited? One that can account for all dimensions without the loop? Then “it’s just math” stops being reassuring and starts being the most dangerous sentence in the industry. A system that actually understands is not one you wave away as arithmetic. It is the one you scrutinize the hardest.
Either way, “it’s just math” fails. Today, because the math is limited. Tomorrow, because if it stops being limited, no one should be calm about that.