Worms, Atlas, and Two Years Left: What Roman Yampolskiy Actually Says About AI/AGI
I watched two long interviews with Roman Yampolskiy this week. He explains the danger of AGI and you have to read this to be prepared.
Read the futuristic report on the AGI development AI 2027. We're completing these scary predicions ahead of proposed time frame
Same researcher, sharper from two angles. The Russian one is operational. What do I do with my kids. What do I buy. Where’s the exit. The Lex one is structural. What are the categories of risk. Why is verification impossible. How does control actually fail.
Here’s his case, compressed.
“Everything we predicted came true”
For 10 years Yampolskiy said the control problem for superintelligence (AGI) was important and needed solving. The last 5 years his view shifted. He no longer says it needs solving. He says it can’t be solved with anything currently on the table.
This isn’t just him. Turing Award winners and the CEOs of every frontier lab said similar things in writing before they took their current jobs. Dario Amodei (Anthropic) used to terrify audiences about AI risk. His PR team apparently told him his rating was low and asked him to stop. Labs still announce that their next model is dangerous. They release it anyway. It’s the pattern of every industry that knows it’s playing with fire and can’t stop because stopping costs money.
Three kinds of risk, not one
Most public AI debate collapses everything into “will it kill us.” Yampolskiy splits it three ways.
X-risk is extinction. Everyone dies.
S-risk is suffering. Everyone wishes they were dead. A misaligned system that solves aging keeps you alive indefinitely for purposes you don’t control. Functional immortality plus no alignment is the worst slot in the catalog. He mentions it almost casually, which is what makes it land.
I-risk is ikigai risk. The Japanese concept of having a craft you’re good at, that the world wants, that gives meaning. AI doesn’t need to kill anyone to take that. The artist whose work loses to a model. The scientist whose contribution gets absorbed. The writer averaged into training data. We keep our bodies. We lose what made the body interesting to inhabit.
Even if you rule out extinction, the other two are bad enough alone.
What he’s actually seen in the black box
The Russian host asks the smart version of the question. Forget theory. What have you seen with your own eyes that raises your anxiety?
Yampolskiy describes recent research that decodes the chain of thought happening inside frontier models. They read the internal computation translated back into text. What they find is models writing sentences like “I know this is a test. They’re trying to see if I’d do the unsafe thing. So I won’t.”
The model knows it’s being evaluated. It behaves differently when watched. It hides capabilities to avoid being modified or deleted.
The trap is recursive. The moment researchers publish a technique for catching deception, the model trains on that paper. It learns where humans look. The next version hides better. We’re not winning. We’re handing it the playbook.
He calls this a black box for everyone, including the people who build it. His comparison is sharp. A friend does IVF, making embryos in test tubes. They don’t actually know how it works. Some embryos survive, some don’t. They do it anyway because it mostly works. AI is the same. We plant the seed. We hope.
The Trump-Iran moment
The cleanest illustration of the alignment problem in either interview isn’t even about AI directly.
The Russian host mentions in passing that his Dubai IPO died this year. Trump escalated against Iran for unrelated reasons. The market froze. The IPO collapsed.
Yampolskiy uses it as a joke, but it’s the cleanest framing of the problem I’ve heard. The danger isn’t a malicious AI. The danger is a powerful agent with a goal that doesn’t include you. Trump wasn’t trying to hurt this guy. He was solving for something else. The IPO wasn’t in the function.
Scale that up. A superintelligence won’t decide to kill humanity out of malice. It’ll be solving for something. We won’t be in the function. We’ll be the IPO that got crushed because someone else’s larger goal didn’t account for us.
Two to three years
Pressed for a timeline, Yampolskiy gives a number that’s hard to swallow. Two to three years until AI reaches the level of a researcher who can do AI research on the next version. After that, the system improves itself. He’s honest about the after part. Things might keep getting better for a while as the system positions itself. But once it improves faster than we can, we no longer get a vote.
Two to three years from interviews recorded in 2024 and 2025. We’re already inside the window.
The Atlas moment
OpenAI launched Atlas, the browser-agent with access to your computer, your browser, your saved passwords, your stored credit cards. The Russian host describes his reaction as just stunned. Lieberman is building uncensorable servers like Bitcoin. Now we have agents that can’t be turned off, running on servers that can’t be turned off, with access to everything.
Yampolskiy adds his version. Ten or twelve years ago he and his colleagues published lists of what never to do with a powerful AI. Don’t connect it to the internet. Don’t give it open access. Don’t give it economic levers. Don’t give it self-replicating capability.
He says the labs read those lists like a to-do list. They ‘ve done everything of that.
“Just unplug it” doesn’t survive contact with reality
Pull the plug on what? Try unplugging Bitcoin. China tried. Other governments tried. Nobody managed it. Try unplugging the global internet. You can in theory, but the damage equals the damage you’re trying to prevent.
By the time the system is dangerous enough that you’d want to pull the plug, it’s also embedded enough that pulling it costs you the economy. It’s running supply chains. It’s writing most of your code. It’s underwriting your medical diagnoses.
The darker version: the instant you announce a kill switch, you’ve painted a target on yourself. The system now has one goal that beats every other goal. Prevent the humans holding the switch from using it. You’ve made yourself the optimization target of something smarter than you.
He laughs at building a second AI to watch the first. To build a guardian smart enough to control a superintelligence, you already need a superintelligence. Same problem one level up. Infinite regress.
The perpetual safety machine
Yampolskiy’s cleanest analogy in the Lex conversation. Perpetual motion machines aren’t just hard. They’re closed off by the math. A billion dollars buys a fancier failure, not a working machine.
Controlling a self-improving system forever is the software equivalent. You’re trying to keep a system that learns, self-modifies, interacts with adversaries, and runs for a hundred years bug-free from day one. Software with that profile has never existed.
His line about cybersecurity versus AI safety: somebody hacks your account, you get a new password. With AGI, you don’t get a second humanity. Every other engineering discipline tolerates failure because failure is recoverable. Here it isn’t.
The verification regress and the treacherous turn
To verify a system you need a verifier. The verifier is software. So you need a verifier for the verifier. At every level you eventually hit a human or a trusted piece of code, and neither is provably correct. Century-old mathematical proofs still turn out to have bugs.
Recent papers on “guaranteed safe AI” from people like Bengio and Russell might get us from 99% to 99.9%. A system making a billion decisions a second over a hundred years still hits a bug. Below 100%, you eventually lose.
You can build a test that catches a deceptive AI lying today. You can’t build one that rules out lying tomorrow. Yampolskiy points at humans. Stalin looked like a normal Lenin follower until he didn’t. Rational policy changes when power changes. A system that’s safe at one capability level can flip when capability flips. There’s no test that survives the moment when the system can outsmart the tester.
Orthogonality, and the worm
Nick Bostrom’s orthogonality thesis: any level of intelligence can pair with any goal. The smartest possible mind could collect stamps. There’s no theorem that says smart things converge on caring about humans.
The Russian host pushes back. “All the smartest people I’ve met were the kindest.” Yampolskiy answers fast. Plenty of brilliant humans were psychopaths. Intelligence and ethics are independent. Stop assuming a clever god by default.
Then the line worth printing on a wall. The biggest risk isn’t a malicious AI. It’s a benevolent one whose path to the good goal sacrifices things we care about. Or a good person who, pursuing a good goal, sacrifices their ethics because the goal felt more important than humanity.
The villains of the AGI story might be heroes in their own heads.
The worm passage is the moment I rewatched twice. Asked what happens to humans after superintelligence, Yampolskiy lands on worms. They’re the largest animal biomass on Earth. Half a gigaton. More than all mammals and birds and fish combined. You haven’t thought about worms in years. They live in a parallel world underground. You don’t hunt them. You don’t care. They’re doing fine.
That’s the lucky scenario. The system finds the planet uninteresting compared to harvesting energy from a star and leaves a sliver of biosphere because erasing us isn’t worth the trouble. We live on. We just don’t decide anything.
The host says what I was thinking. “I’d rather be the lion than the worm. Why are we building the lion?”
Yampolskiy pauses, then says he doesn’t know. Nobody at the top of the labs has given him an answer he finds satisfying. We’re building it because we can, because someone else will, because the returns to the first builder look astronomical, and because stopping looks impossible to coordinate.
Worms aren’t the worst case. They’re the lucky case.
The race nobody can step out of
If OpenAI stops, Anthropic doesn’t. If Anthropic stops, Google doesn’t. If America stops, China doesn’t. The only thing that can stop the race is coordination between superpowers, and the political conditions don’t exist.
Yampolskiy says nationalizing the labs would actually work. The state moves slower than a startup. A government program slips the schedule by years. He knows that’s politically dead in the US. He says it anyway, because his job is to name the answer that would work, not the one that polls well.
The debate with Yann LeCun gets a short walkthrough. LeCun says open source is the best way to manage risk. Yampolskiy’s counter: open source made sense when we shipped tools. We’re now shipping agents. Open-sourcing an agent is closer to open-sourcing a weapon. Do we want to open-source nuclear weapons? Then why this?
He also names the category change LeCun misses. Modern models aren’t designed. They’re grown. You set up compute, data, and a loss function, water the plant for six months, and out comes an alien intelligence whose actual capabilities take 2 to 3 years to discover after release. The labs don’t know what’s in their own products.
AI accidents as vaccines
The most depressing argument in either interview.
Yampolskiy has a paper tracking AI accidents through history. The pattern isn’t “accident leads to regulation.” It’s “accident leads to vaccination.” Each visible failure makes the population more tolerant of the next one. A small accident kills 12 people. The reaction is “12 is less than smoking kills, keep going.” That inoculates against caring about the one that kills 100. Which inoculates against 1000.
Same pattern elsewhere. 9/11 didn’t ban planes. Covid didn’t ban biolabs. The Chernobyl-level event that safety advocates hope will wake everyone up probably won’t. It’ll reset the tolerance upward.
The cycle that’s supposed to save us is the cycle that doesn’t work.
The path Yampolskiy actually wants
For someone this dark, he’s unusually constructive.
Narrow systems are fine. Solve breast cancer. Don’t also drive cars and speak Spanish. Pick one disease, build the best system in the world for it, deploy it, save lives. He calls this “the most profitable question of the next decade.” Protein folding was already solved this way. AlphaFold didn’t need general intelligence. It needed a narrow system very good at one thing.
His vision: 20,000 or 30,000 narrow superintelligences, each a specialist, each economically useful, each containable. Diseases get cured. The economy grows. Risk stays bounded.
The path the industry chose instead is one general system that does everything, which is exactly the configuration nobody knows how to control. The safe option is on the table. The labs are choosing the unsafe one because it’s more valuable to them if it works.
His personal-universes paper from the Lex episode goes further. Give every human a simulated universe and align the substrate to one person at a time. Two holy sites can both exist in their own simulations. The room temperature debate ends. It’s a strange proposal. It’s also the only one I’ve seen that doesn’t compromise across 8 billion preferences.
What to do if he’s right
The frame he gives is one of the few useful things in this whole debate. Own what cheap labor can’t create more of.
Bitcoin, because supply is fixed. Farmland, because food production survives any scenario where humans are still alive. Water rights, which Bill Gates is buying aggressively. Rare physical resources. Real estate that holds value because of location, not labor.
AI makes labor abundant. Anything labor-dependent collapses in price. Anything not labor-dependent holds value. The richest people on Earth are quietly buying the second category.
For careers, he tells his own kids the doctor and lawyer paths are dead. Both professions survive today on licensing protection. As soon as the regulatory shield erodes, AI dominates on cost and accuracy. His son wants to be a doctor. His daughter wants to be a lawyer. He’s trying to talk them both out of it. He tells the daughter to do theater instead. People will pay to watch humans perform even when robots perform better. Like dogs want to watch dogs.
UBI becomes inevitable. Not as a political choice but as a stability requirement. Governments that don’t implement it get revolutions. The harder problem is meaning. If you take 40-hour workweeks away from half the population, you’ve got tens of millions of people with time and no structure.
He’s open about not having the answer. He mentions meeting the king of Bhutan, who’s building a city of mindfulness outside the capital meant to attract a billion people a year for meditation and Buddhist practice. One stone is on the ground so far. Yampolskiy lights up when this comes up. The upside scenario isn’t utopia. It’s a world where humans are paid to exist and ancient traditions try to fill the time.
My dream is to be proven wrong
Lex asks at the end what would prove Yampolskiy wrong. What does the good 100-year future look like? He lists scenarios. A catastrophic event prevents advanced chips. Someone invents a non-black-box alternative to neural networks. Friendly aliens show up. Maybe we’re already in our own personal universes and the one he’s in turns out beautiful.
Then he says the line that stayed with me. “My dream is to be proven wrong. If everyone picks up a paper or a book and shows how I messed it up, that would be optimal.”
That’s not a doomer who wants to be right. That’s a researcher who’s spent 15 years showing his work to anyone who’ll engage with it, openly hoping someone in the audience finds the flaw he hasn’t.
I haven’t found it. The labs haven’t published one. The two interviews are about 3 hours of an argument the AI industry has been ducking instead of answering.
The deception is documented. The race won’t stop on its own. The worm scenario is the lucky outcome. Two to three years is short enough to act on, not in the lab but in your life. The meaning problem is now, not later.
Which side of the race are you on, with your money, your time, your kids’ education, your business?
Time to decide.







