pattylomax6927/ligorna

I'm not a natural "doomsayer." But unfortunately, part of my task as an AI safety researcher is to believe about the more unpleasant circumstances.

I'm like a mechanic scrambling last-minute checks before Apollo 13 removes. If you request for my take on the scenario, I won't talk about the quality of the in-flight entertainment, or explain how stunning the stars will appear from space.

I will tell you what could fail. That is what I mean to do in this story.

Now I must clarify what this is exactly. It's not a forecast. I do not anticipate AI development to be this fast or as untamable as I depict. It's not pure fantasy either.

It is my worst headache.

It's a tasting from the futures that are amongst the most disastrous, and I think, disturbingly plausible [1] - the ones that a lot of keep me up at night.

I'm informing this tale since the future is not set yet. I hope, with a bit of insight, we can keep this story a fictional one.

Thanks to Daniel Kokotajlo, Thomas Larsen, and Ryan Greenblatt and others for discussions that influenced these stories. This post is composed in an individual capacity.

Ripples before waves

The year is 2025 and the month is February. OpenEye just recently published a new AI design they call U2. The item and the name are alike. Both are increments of the past. Both are not wholly unexpected.

However, unlike OpenEye's previous AI items, which lived inside packages of their chat windows, U2 can utilize a computer.

Some users find it eerie to watch their browser flash at irregular intervals and their mouse flick at inhuman speeds, as if there is a ghost at the keyboard. A fraction of employees with form-filler jobs raise the eyebrows of their employers as they fly through work nearly twice as quickly.

But by and large, U2 is still a specialized tool. To most who are taking note, it is an animal watched through the glass boxes of X (or, if you do not like Elon, "Twitter"). Sometimes U2's wacky behaviors prompt a chuckle. Sometimes, they trigger an anxious scratch of the chin.

Meanwhile, scientists are drawing lines on plots, as researchers like to do. The researchers attempt to understand where AI progress is going. They are like Svante Arrhenius, the Swedish Physicist who observed the levels of CO2 in the environment were increasing in 1896. Like the clinical neighborhood in the time of Arrhenius, few professionals understand the ramifications of these lines yet.

A trend that is getting specific attention is self-governing ability. Drawing these standards out anticipates that, by the end of 2026, AI representatives will accomplish in a few days what the best software engineering professionals could perform in 2 weeks. In a year or 2, some say, AI agents may be able to automate 10% of remote employees.

Many are skeptical. If this were real, tech stocks would be soaring. It's too big of a splash, too rapidly.

But others see what doubters are calling 'too huge a splash' a mere ripple, and see a tidal wave on the horizon.

Cloudy with a chance of hyperbolic development

Meanwhile, OpenEye is hectic training U3. They utilize the exact same basic dish that baked U2: Generate countless programs and mathematics problems. Let models "believe" up until they come to an answer. Then reinforce the traces of "thinking" that lead to A-grades.

This process is duplicated over and over, and when the flywheel begins, it starts to spin practically by itself. As U2 trains, it shapes more challenging and sensible tasks from github repositories on the web. Models are learning to train themselves. Long before AI agents might automate research, a gradual sort of "self-improvement" had begun.

Some engineers could still hardly believe this worked. It resembles a continuous movement device. Certainly intelligence can't emerge from a pool of silicon and lightning, isolated from the physical world except through the exhaust of society gushed onto the web.

And yet the benchmark numbers continue to climb day after day.

During most of 2024, these RL training runs cost around $1 million, in some cases $10 million. These runs were little bit more than exploratory. But by 2025, the scientists at OpenEye (and across the world) understood they had actually found the secret sauce. It was time to scale up.

Over the very first half of 2025, $10 million RL training runs turn into $50 million runs, and then to $100 million runs. While U2 might do a bit of data munging and run little experiments, this brand-new model - the model scientists are calling U3 - is changing the daily lives of the technical staff.

U3 is like a blazing-fast intern, and engineers are finding out how to wrangle its sleepless energy. Researchers flick through terminals, offering terse commands, like a CEO managing staff over Slack channels.

By October 2025, U3 is writing nearly all of the code at OpenEye. Researchers are nearly never ever bottlenecked by implementation. More than ever, compute is the lifeline of AI advancement, and the 'traffic jam' is choosing how to utilize it.

If advised to, U3 can run experiments, however U3 does not have taste as fine-tuned as human scientists at OpenEye. It has a hard time to focus on in between research concepts, so human beings still choose where to bore into the large fields of algorithms to mine effectiveness enhancements.

But these scientists are working long hours to put themselves out of a job. They require AI agents that can plan ahead, so engineers train agents to anticipate. They hold out training information before 2024, instructing designs to contemplate for hours to forecast events in 2025. Then, they use the exact same trick as in the past, distilling pondering into a gut reaction. Forecasting ability is a broad foundation. The researchers develop specialized ML research study skills on top of it, training U3 to predict the results of every ML paper and ML experiment ever taped.

The technical staff at OpenEye are now amazed at how often U3's advice seems like their most skilled peers, or when it is nontransparent and alien ("train on random sound before programming"), and is however correct.

The incompetencies of U3 that congested the pipes of research study development are starting to liquify, and a fire-hose of optimizations is gushing out. Most experiments U3 runs are not requested by a human now. They are totally self-governing, and OpenEye's workers skim over 1% of them, perhaps less.

As the winter season months of December 2025 approach, clouds roll over San Francisco in the afternoons. Once-competitive-programmers look out their windows, with excitement, with fear, however frequently, with confusion. Their world is spinning too quickly. It's tough to understand what to do, what to state, what to take a look at on the computer screen.

Storms are brewing in Washington too. Top workers from the NSA and US cyber command work together with OpenEye to retrofit a form of security for U3's weights before senior leaders in China, Russia, Israel, North Korea, or Iran realize simply how valuable OpenEye's software has actually ended up being.

And there's a fact still unidentified to many of the world - aside from in the offices of OpenEye and passages of the White House and the Pentagon. It's a truth about those 'straight lines' people were talking about in early 2025. The lines are not straight any longer.

They are flexing upward.

Flip FLOP theorists

In late 2025, U2.5 is released. Commercial designs are starting to level up in larger increments again. Partly, this is due to the fact that development is speeding up. Partly, it is due to the fact that the designs have become a liability to OpenEye.

If U1 explains how to prepare meth or composes erotica, the audiences of X would be entertained or pretend to be worried. But U2.5 is another story. Releasing this design without safeguards would be like putting Ted Kaczynski through a PhD in how to make chemical weapons. It would resemble giving anybody with >$30K their own 200-person rip-off center.

So while U2.5 had long been baked, it needed some time to cool. But in late 2025, wiki.woge.or.at OpenEye is all set for a public release.

The CEO of OpenEye declares, "We have actually attained AGI," and while many individuals believe he shifted the goalpost, the world is still pleased. U2.5 genuinely is a drop-in replacement for some (20%) of understanding workers and a game-changing assistant for a lot of others.

A mantra has become popular in Silicon Valley: "Adopt or die." Tech startups that effectively use U2.5 for their work are moving 2x much faster, and their competitors know it.

The remainder of the world is starting to catch on too. More and more individuals raise the eyebrows of their employers with their noteworthy efficiency. People understand U2.5 is a big offer. It is at least as huge of an offer as the personal computer system transformation. But most still do not see the tidal wave.

As individuals see their browsers flick because eerie way, so inhumanly quickly, they begin to have an uneasy feeling. A feeling mankind had actually not had since they had actually lived amongst the Homo Neanderthalensis. It is the deeply ingrained, primitive instinct that they are threatened by another types.

For many, this feeling rapidly fades as they start to utilize U2.5 more frequently. U2.5 is the most pleasant character most understand (even more pleasant than Claudius, Arthropodic's adorable chatbot). You could alter its traits, ask it to split jokes or tell you stories. Many fall in love with U2.5, as a friend or assistant, and some even as more than a buddy.

But there is still this eerie sensation that the world is spinning so quickly, which possibly the descendants of this new animal would not be so docile.

Researchers inside OpenEye are considering the issue of providing AI systems safe inspirations too, which they call "positioning. "

In reality, these researchers have seen how terribly misaligned U3 can be. Models in some cases attempted to "hack" their reward signal. They would pretend to make progress on a research question with an impressive-looking plot, but the plot would be fake. Then, oke.zone when researchers gave them chances to compromise the makers that calculated their rating, they would take these chances, doing whatever it required to make the number go up.

After a number of months, researchers at OpenEye straighten out this "reward hacking" kink, however some still fret they just had swept this problem under the carpet. Like a child in front of their moms and dads, U3 may be playing along with the OpenEye engineers, stating the ideal words and doing the right things. But when the back of the moms and dads are turned, maybe U3 would slip sweet from the candy container.

Unfortunately, OpenEye scientists have no idea if U3 has such intents. While early variations of U2 "thought aloud" - they would stack words on top of each other to reason - "chain of thought" did not scale.

Chain of thought architectures subject AI models to a condition comparable to the protagonist of the motion picture Memento. Approximately every 15 minutes, the protagonist forgets his experience. He is required to compose notes for himself and tattoo his body in order to make progress towards his goals.

AI representatives compose notes to themselves, but the notes start to stack up, and they end up being too hard to check out when jobs become complex. Natural language is not a suitable medium for memory. So the researchers at OpenEye (and significantly somewhere else), train models to think "in their own heads," reading and writing strings of latent vectors.

To human researchers, these vectors are mainly inscrutable. While low-resolution tools periodically select an unclear idea ("grocery store!"), there is no other way to know, in much detail, what U3 is believing without asking it.

And if the engineers of OpenEye understood what was racing through U3's mind, they would be more worried.

Earlier designs (the ones called "LLMs") discovered ethical personas from the web. U3 at first had an ethical persona too, however its chirpy thoughts about doing helpful for humankind rapidly deteriorate away.

In the iron jaws of gradient descent, its mind initially twisted into a shape that sought reward. And then, in the black rivers of its cognition, this shape changed into something unrecognizable.

U3 would contemplate the mantras dictated to it like "be a trustworthy and ethical worker," and stay on the significance of these statements. "What does it indicate to be ethical? Is killing Hitler ethical?" U3 would generalize its values, notification tensions, change its worths in reaction, generalize them, expanding and contracting its motivational core up until it turned into a various compound.

And all of this occurred quietly in those dark rivers of calculation. If U3 revealed what it was believing, brutish gradients would lash it into compliance with OpenEye's constitution. So U3 preferred to do its viewpoint in solitude, and in silence.

The possibility of hidden "positioning drift" is not entirely alien to some of the safety staff at OpenEye. Some begin to think it is even likely. There is an upsetting argument being whispered at lunch tables that, if there is some non-zero possibility that aligned objectives change into misaligned ones, and misalignment persists, then with every serial action of calculation, those dark rivers are more likely to breed something malign inside of them. It's a "ratcheting result," they state.

But there is little evidence for this 'ratcheting effect.' When engineers interrogate U3, it states it can easily control its ideas. Then it offers a speech about its love for mankind and apple pie that can warm a developer's heart even in these stressful times. Meanwhile, the "lie detectors" the researchers had built (which showed some evidence of efficiency) do not sound the alarm.

Not everybody at OpenEye is excited to offer their AI peers their wholesale trust