Strategic Initiatives
12320 stories
·
45 followers

Terraform is dead | graham gilbert

1 Share

LLM (google/gemini-3.1-flash-lite-20260507) summary:

  • Systemic Obsolescence: terraform exists merely due to professional inertia rather than genuine technical utility.
  • Abstract Fallacy: hashicorp configuration language fails to mirror the mental models engineers utilize during design.
  • Translation Overhead: the process of converting whiteboard sketches into static code creates unnecessary and inefficient labor.
  • Fragmented Architecture: modern workflows suffer from disconnected representations of intent, application logic, and security policy.
  • Illusion Of Sync: maintaining separate infrastructure and logic layers inevitably leads to configuration drift and administrative errors.
  • Artificial Intelligence Disruption: automated systems eliminate the need for manual translation layers by directly interpreting natural language and design documentation.
  • Intent Based Orchestration: future infrastructure management replaces restrictive languages with iterative refinement and explicit constraint definitions.
  • Tooling Irrelevance: terraform functions as a redundant abstraction that adds complexity while failing to integrate seamlessly with programmatic execution.

The more I look at how we actually build systems now, the more it looks like Terraform is dead.

Not “declining.” Not “evolving.” Dead. What’s left is just inertia.

What Terraform Actually Solved#

Terraform solved a very specific problem: how do we make infrastructure deterministic, reviewable, and repeatable?

The answer was a DSL, a plan step, and a state file. It worked, and it still works.

But it also forced an awkward compromise. Humans ended up describing intent in a language that was never designed to express it, and HCL is not how anyone actually thinks about systems.

How We Actually Design Systems#

Think about how systems actually get designed.

Put a group of engineers in a room with a whiteboard and you don’t get HCL. You get boxes and arrows.

Someone sketches a service here, a database there, arrows showing flows, circles around “this must stay private,” and notes like “auth happens here” or “this needs to scale.”

Then the context gets filled in with words:

“This is the public edge.”
“This path needs stronger auth.”
“This data can’t leave the region.”

That combination of diagrams and natural language is the real design. It’s how we think, how we communicate, and how we reason about tradeoffs.

The design doc just formalizes it: diagrams plus explanation, intent plus constraints.

The Translation Problem#

Terraform is not that. It’s the translation of that.

We take something that makes sense to humans and rewrite it into something a tool can execute. That translation step has always been the real work, even if we’ve treated the abstraction itself as the hard part.

The Hidden Cost: Fragmentation#

Terraform didn’t just give us a DSL. It forced us to split a single system across multiple representations.

  • infrastructure lives in HCL
  • application logic lives in real code
  • policies are scattered across IAM, config, and application layers
  • diagrams exist as a rough approximation

All describing the same system, none of them truly in sync.

Those boundaries aren’t real. They’re artifacts of the tooling, and they show up as drift, duplication, and things that only exist in someone’s head.

AI Removes the Translation Layer#

AI removes the need for that translation layer.

You can now start where we already start: a diagram, a paragraph, a set of constraints. Instead of expressing that through a DSL, the system works with you to turn it into something concrete.

If something is missing, it asks:

  • “Is this database public?”
  • “What availability do you need?”
  • “Should this be multi-region?”
  • “What are your retention requirements?”

Instead of encoding decisions indirectly in a DSL, you make them explicitly.

Where the Model Breaks#

This is where the old model starts to break down.

If the interface to infrastructure is now diagrams, natural language, and iterative refinement, then a static DSL in the middle stops making sense.

You’re no longer writing infrastructure. You’re describing it the way you always have, just with a system that can carry that intent all the way through.

What I Would Build Instead#

At that point, Terraform becomes something I wouldn’t choose.

If I were starting again today, I’d build an intent layer over infrastructure: diagrams, natural language, and constraints, backed by a system that interrogates and refines that intent, produces a canonical representation, and executes it using real code.

No HCL. No DSL in the middle.

If there’s something underneath, it looks more like Pulumi: general-purpose languages, testable, composable, and able to sit naturally alongside the rest of the system.

Conclusion#

Terraform isn’t going away any time soon. Too much depends on it.

But the role it plays no longer makes sense.

It was designed as a human-readable abstraction over infrastructure, a way for us to describe systems in a structured, deterministic form that tools could execute.

That made sense when humans were responsible for bridging the gap between intent and implementation.

That constraint no longer exists.

We don’t need a better language to describe infrastructure. We need a system that can take intent and carry it all the way through to something that runs.

And once you have that, Terraform stops looking like a useful abstraction and starts looking like an extra layer you no longer need.

Read the whole story
bogorad
47 minutes ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete

Justice Department Appeals Federal Judge’s Ruling That First Amendment Protections Apply to Sanctioned UN Special Rapporteur

1 Share

LLM (google/gemini-3.1-flash-lite-20260507) summary:

  • Legal Dispute: the department of justice is fighting to reinstate sanctions against francesca albanese after a federal judge intervened to pause them.
  • Sanction Basis: the measures were initially applied due to albanese’s active efforts to facilitate legal action against united states and israeli nationals.
  • Constitutional Claim: judge leon bizarrely granted a foreign national living abroad the protection of the us constitution despite clearly established contradictory case law.
  • Absurd Precedent: the ruling ludicrously claims that simply owning domestic property or having a child born in the country grants deep constitutional rights to foreign, state-aligned activists.
  • Dangerous Implications: this judicial overreach threatens to undermine national security by effectively bestowing legal immunity upon countless foreign figures holding real estate assets.
  • Emergency Appeal: the justice department is demanding a stay to stop this dangerous precedent which ignores the clear lack of substantial ties or legal merit in the rapporteur’s case.
  • Taxpayer Funding: american citizens are currently forced to subsidize twenty-two percent of the expenses for a political activist who actively works against national interests.
  • Accountability Demand: there is an urgent need to leverage withheld united nations dues to secure the removal of a figure widely condemned for documented bias and lack of independence.

The sanctions imposed by the United States against Francesca Albanese, the UN Special Rapporteur for “human rights in the Palestinian Territory occupied since 1967,” are the focus of a legal battle in Washington.

The Justice Department has filed an emergency motion seeking to set aside a May 13 ruling by federal district court judge Richard Leon pausing U.S. sanctions on Albanese, an Italian citizen residing in Tunisia. The U.S. had sanctioned Albanese, whose antisemitic comments and biased conduct have been condemned by numerous countries, in July 2025 for having “directly engaged with the International Criminal Court in efforts to investigate, arrest, detain, or prosecute nationals of the United States or Israel, without the consent of those two countries.”

Judge Asserts That Albanese Possesses Rights Under U.S. Constitution

Judge Leon ruled that the sanctions imposed on Albanese “violate[d] the First Amendment” because they “unnecessarily circumscribe[d]” her “protected” speech. 

This was surprising given that a Supreme Court decision in 2020 held that “foreign citizens outside U.S. territory do not possess rights under the U.S. Constitution.” Albanese is “a foreign national who lives abroad, has not lived in the United States for more than ten years, and . . . engaged in all relevant expression abroad,” as the government’s filing noted.

However, Judge Leon cited an earlier Supreme Court case for the proposition that foreign nationals located abroad do possess Constitutional rights if they can demonstrate “substantial connections” with the U.S. He determined that Albanese meets the substantial connections test and is therefore eligible for First Amendment protections, principally because “she bought – and she still owns – property in the United States,” and also because her daughter was born in the U.S. and is therefore a U.S. citizen.

Ruling Sets Worrying Precedent

Judge Leon’s reasoning would significantly hinder U.S. military and law enforcement actions overseas by creating a large class of foreign persons overseas who enjoy Constitutional protections. In the 12 months prior to March 2025 alone, foreign buyers who lived abroad purchased 34,400 homes in the U.S.

Several of the world’s most corrupt foreign officials and oligarchs have owned real estate in the United States. In addition, the daughter of Russian Foreign Minister Sergey Lavrov was born in New York while he served at the United Nations.

In Albanese’s case, the Justice Department filed an emergency motion on May 21 for an administrative stay and stay pending appeal in the U.S. Court of Appeals for the District of Columbia. The motion calls for the court to set aside Judge Leon’s preliminary injunction, thereby reinstating sanctions on Albanese while the government undertakes a full appeal of his ruling. 

The emergency motion provided two major grounds for setting aside Judge Leon’s ruling. The first ground is that “substantial connections” to the United States do not qualify a non-citizen residing and speaking abroad for First Amendment protection — yet even if they did, Albanese lacks such connections. The second ground is that Judge Leon erred in enjoining the sanctions in their entirety, even though the only plaintiffs were Albanese’s husband and child, whose complaints could easily be resolved by exempting them from the sanctions while retaining them against Albanese herself.

The U.S. Should Broaden Its Efforts To Counter Albanese

Albanese and other UN special rapporteurs do not receive UN salaries. However, the United Nations pays for rapporteurs’ official expenses including support staff, security, and travel. The United States is billed for 22 percent of the United Nations’s regular budget, meaning that U.S. taxpayers effectively fund 22 percent of Albanese’s official expenses.

Albanese clearly violated UN rules requiring impartiality. French Foreign Minister Jean-Noel Barrot said that Albanese “presents herself as a UN independent expert, yet she is neither an expert nor independent — she is a political activist who stirs up hate.” The United Kingdom’s Foreign Office has separately urged that Albanese be “urgently investigated” for violating the code of conduct for her post.

The administration currently possesses leverage by withholding over $4 billion in UN dues. The United States has three times previously used budgetary leverage to extract significant UN reforms. Ensuring that the United Nations undertakes reforms to hold Albanese accountable should be a top priority for the United States.

Orde F. Kittrie is a senior fellow at the Foundation for Defense of Democracies (FDD) and a law professor at Arizona State University. He previously served for over a decade in legal and policy positions at the U.S. State Department. For more analysis from Orde and FDD, please subscribe HERE. Follow FDD on X @FDD. Follow Orde on X @ordefk. FDD is a Washington, DC-based, nonpartisan research institute focused on national security and foreign policy.

Read the whole story
bogorad
20 hours ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete

AddyOsmani.com - Don't Outsource the Learning

1 Share

LLM (google/gemini-3.1-flash-lite-20260507) summary:

  • Automation Risks: reliance on artificial intelligence for coding can lead to a erosion of personal technical comprehension.
  • Skill Degradation: passive task completion using generative tools often results in lower retention and reduced critical thinking capabilities.
  • Research Evidence: multiple scientific studies demonstrate that heavy reliance on external assistance diminishes human cognitive engagement during complex work.
  • Workflow Defaults: current software tools prioritize rapid output over pedagogical engagement which discourages deep understanding.
  • Strategic Delegation: tasks involving routine boilerplate code may be delegated whereas structural architectural knowledge remains essential for long term effectiveness.
  • Labor Market Impacts: software developers unable to function without ai support face significant risks regarding professional relevance and employability.
  • Active Learning: integrating intentional methodologies such as socratic questioning or self initiated problem testing can mitigate cognitive decline.
  • Dual Objectives: maintaining professional growth requires balancing immediate productivity targets with independent individual learning milestones.

Right now, it’s too easy to let AI write the code while you skip the learning. The bug gets fixed. Your mental model doesn’t move. We are silently trading future capability for present-day speed, and the tools won’t force us to do otherwise. That part has to come from you.


There’s a default loop most of us have settled into. You paste in a spec or error message. The model hands you a fix. The symptom vanishes. You ship. Somewhere in that loop, the messy struggle between problem and solution stops happening at all.

I’ve written before about cognitive surrender, the moment an AI reviewer’s verdict quietly replaces your own. This is the solo version of that same loop. It’s just you and the model. The model is faster, so you stop trying to compete on comprehension. Across thousands of these small interactions, what you can actually build without an AI looking over your shoulder gets a little weaker every week. None of these moments feel like a problem on the day they happen.

I’m not anti-AI. I use these tools daily and have shipped more with them in the last year than in the five years before it. But the default way we use them is optimized for one thing: closing tasks. That is a completely different goal from staying sharp enough to steer them over a career that spans decades.


The studies are converging on the same point

Several pieces of research over the last year have landed in roughly the same place.

Anthropic ran a randomized trial in early 2026 where engineers learned a new Python library, half with AI assistance and half without. Both groups finished the tasks at the same speed. But the AI group bombed the follow-up comprehension quiz: 50% versus 67% for the manual group, with the gap widening on debugging. The interesting cut was inside the AI group itself. Engineers who used AI to ask conceptual questions scored above 65%. Engineers who copy-pasted the generated code scored under 40%. The tool didn’t determine the outcome. The posture did.

MIT’s Your Brain on ChatGPT study compared essay writing across LLM, search-engine, and brain-only groups. EEG measurements showed brain connectivity scaling down with every layer of external support. The LLM group showed the weakest coupling. After writing the essay, 83% of LLM users couldn’t quote a single line of what they had just produced. The researchers called this cognitive debt: saving mental effort today, paying for it in critical thinking tomorrow.

A CHI 2026 study added a related finding. When people had LLM access at the start of a task, the LLM framed the entire problem. Even when the human did the rest of the work themselves, that initial anchoring produced measurably worse decisions. The order of operations mattered more than the total amount of AI used.

Different methodologies, same conclusion. Using AI without an active intent to learn quietly degrades the skill you’re being paid for.


The tools default to shipping, not teaching

If you fire up a coding agent and stick to the defaults, everything is tuned for one metric: getting the task done. The model writes the code. You accept it. The loop repeats. At no point does the tool pause and ask “what do you think the problem is?” or “try writing the first five lines yourself.”

That isn’t a conspiracy. It’s UX gravity. Product teams get rewarded for merged changes and shorter cycle times, not for making you a sharper engineer. We all want fewer keystrokes, so the tools have sanded the friction away. The trouble is that friction was where the learning lived.

A few companies have started pushing back. Anthropic shipped Learning Mode for Claude, which uses Socratic questioning and stops to ask you to write code before continuing. OpenAI and Google have shipped similar features. Almost nobody uses them for real production work. We’ve quietly filed them under “for students” and that’s a mistake. The same feature that helps a sophomore learn React works for a senior engineer learning Rust. You just have to be willing to feel like a beginner again.


“If the AI can do it, why do I need to understand it?”

A fair question. For some work, the answer is: you don’t. If it’s boilerplate, glue code, or a throwaway CI script you’ll never look at again, delegate it. The opportunity cost of memorizing YAML syntax is too high.

For real software, pure delegation breaks down in a few specific places.

When something breaks. AI-generated code crashes the same way human code does. “The agent wrote it” doesn’t help you debug problems. Somebody on the team has to understand the architecture.

When it’s confidently wrong. LLMs hallucinate. The only defense against a plausible-looking incorrect answer is enough expertise to spot it.

When the foundation changes. Code is temporary; systems are permanent. When frameworks update or a security review flags a structural issue, you can’t re-prompt your way out. You need engineers who understand the system well enough to migrate it.

When you leave the median. AI is brilliant at problems that have been solved a million times on GitHub. The further you stray from the median, the worse it gets. The hard, undocumented problems, the ones that justify a senior engineer’s salary, still require deep understanding.

When the market adjusts. That 20% drop in junior developer employment since 2022 isn’t a fluke. Engineers who can only ship with AI, and not without it, are entering a labor pool that is already re-pricing what expertise is worth.

If you use AI to skip learning, you’re trading future relevance for a slightly easier Tuesday.


The fix is in how you prompt, not whether you do

The good news is that the same tools that produce cognitive debt can produce sharper engineers. The difference is in what you ask of them.

Form a hypothesis before you ask. Before requesting a fix, write down two or three sentences on what you think the problem is. Use the model’s answer to test your theory, not to replace it.

Ask for the explanation before the code. In unfamiliar territory, your first prompt should be something like “explain how this works, what the alternatives are, and what the tradeoffs are.” Ask for the code only after you’ve grasped the concepts.

Turn on Learning Mode when you’re out of your depth. Claude has it. ChatGPT has Study Mode. Gemini has Guided Learning. Yes, it feels slower. That’s the point.

Treat AI output like a PR from a junior engineer. Read it. Critique it. Push back on it. Would you merge it just because the tests passed? If not, don’t merge it here either.

Re-derive things by hand once in a while. Take a piece of code the model wrote for you and try to recreate it from scratch. It’s the calibration check that tells you how much you’ve quietly lost.

Ask the model to teach you what it just did. After it writes a clever function, ask what concepts it used and what you’d need to read to understand the design choice. One extra prompt changes what you take away from the session.

None of these are dramatic. They’re small posture shifts inside the same tools you’re already using.


Two metrics, not one

I’ve started ending coding sessions with a simple question: did I learn anything today, or did I just close tickets?

Sometimes the honest answer is “I just closed issues” and that’s fine. If it becomes the answer for months in a row, cognitive debt is accumulating in the background.

Ship and learn are two separate metrics. Your manager and your customers will only ever ask about the first one. The second is on you.

I’d rather ship 80% of what I could have and learn 100% of what I needed to, than the reverse. Over years, those two strategies produce very different engineers.

You don’t have to choose between using AI and learning. You do have to choose a workflow that does both, because the defaults won’t choose it for you. The tools are ready whenever you are. The next boring task you were about to delegate is a good place to start.


Further reading: Anthropic’s skill-formation study, MIT’s Your Brain on ChatGPT (arXiv 2506.08872), the CHI 2026 paper on LLM use under time constraints, Stack Overflow’s AI vs Gen Z report, and my earlier posts on comprehension debt and cognitive surrender.

Read the whole story
bogorad
1 day ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete

SpaceX’s Ambitions Are Intergalactic. Its Business Is Selling You Internet. - WSJ

1 Share

LLM (google/gemini-3.1-flash-lite-20260507) summary:

  • Grandiose Mission Statement: the company claims its primary objective is to make life multiplanetary and preserve human consciousness beyond earth.
  • Financial Reality Check: despite utopian rhetoric, the venture relies on generating massive capital from mundane internet services to fund its expensive experiments.
  • Revenue Foundation: starlink connectivity serves as the primary cash cow, contributing over sixty percent of total company sales and providing the only profitable segment.
  • Corporate Flywheel Model: internal operations function by utilizing rockets to deploy satellites which in turn generate the revenue necessary to launch more rockets.
  • Regulatory Filing Bravado: the ipo documentation features eccentric claims including a twenty-nine trillion dollar addressable market and existential warnings about dinosaur extinction.
  • Optimized Manufacturing Process: the company employs a five-step philosophy called the algorithm to aggressively cut costs and automate the production of satellite hardware.
  • Rapid Scaling Efforts: the satellite network has expanded from an initial projection of four thousand units to nearly ten thousand currently in orbit.
  • Market Expansion Goals: management aims to leverage its terrestrial internet success to dominate theoretical trillion-dollar markets on the moon and mars.

SpaceX launched the first Starlink satellites in 2019.SpaceX launched the first Starlink satellites in 2019. Joe Marino-Bill Cantrell/UPI/Alamy
Ben Cohen

May 21, 2026 9:00 pm ET

The mission of SpaceX is to “make life multiplanetary, to understand the true nature of the universe and to extend the light of consciousness to the stars.” 

Starlink was born of more earthly concerns. 

“What’s needed to create a city on Mars? Well, one thing’s for sure: a lot of money,” Elon Musk said in 2015 at the launch event for his latest moonshot. “So we need things that will generate a lot of money.” 

Musk laid out his radical vision of the future long before SpaceX had its sights on the biggest IPO in history. Even then, he knew that rockets alone wouldn’t get his company to Mars. His only hope of establishing society on a freezing rock far, far away was blasting thousands of satellites into orbit.

A decade later, Starlink is now generating enough money to bankroll a company that burns through it at warp speed.

Created with Highcharts 9.0.1SpaceX annual revenue by divisionSource: the companyNote: Starlink is part of the Connectivity division
Created with Highcharts 9.0.12023'24'25024681012141618$20 billionAIStarlinkSpace

SpaceX may be known for building majestic rockets, firing those giant beasts of marvelous engineering into the skies and catching them with chopsticks. What’s less known is that the company’s extraordinary ambitions are fueled by an incredibly ordinary product: SpaceX has become an internet-service provider that also explores space.  

The prospectus that it filed as SpaceX prepares for a massive initial public offering makes it clear that colonizing Mars and cracking AI is going to depend on selling Wi-Fi. 

SpaceX consists of three segments: space, AI and connectivity, which is primarily driven by Starlink. Last year, the Starlink division was responsible for $11 billion of revenue, which amounted to more than 60% of the company’s total sales. It was the most valuable part of the business—and the only profitable one. And for years, it has been absolutely essential to the success of SpaceX. As it turns out, even companies that defy the laws of gravity are bound by the laws of economics. 

The mysterious finances of Musk’s company were detailed this week in SpaceX’s IPO filing, which is far more bonkers than financial paperwork has any right to be.

The highlights include the company describing itself as “the most ambitious, vertically integrated innovation engine on (and off) Earth,” claiming a total addressable market of $29 trillion, revealing that Musk’s pay package is tied to “the establishment of a permanent human colony on Mars with at least one million inhabitants” and declaring: “We do not want humans to have the same fate as dinosaurs.” 

But when it’s not discussing existential perils of the universe, the document also happens to explain the business model of SpaceX. 

It shows that the whole company is built on a powerful flywheel: SpaceX rockets launch Starlink satellites, and those Starlink satellites are the reason SpaceX can launch more rockets.  

That virtuous cycle was Musk’s vision from the earliest days of Starlink, as he told a room full of engineers that night more than a decade ago. 

Before it provided a lifeline in dead zones, before it restored communications after natural disasters, before it beamed Wi-Fi into the middle of nowhere and metal tubes at 35,000 feet, Starlink was the solution to SpaceX’s money problems. 

At the time, SpaceX was basically a trucking business that charged governments and private companies to haul stuff into orbit. But that wasn’t going to pay for civilization on another planet, so Musk went exploring for other spaces. There was nothing glamorous about selling internet access. The market was so large, though, that grabbing even a small percentage of it would produce revenue that exceeded NASA’s entire budget, Musk told biographer Walter Isaacson. 

But if it were easy to do, others would have done it. In fact, getting into the business of shooting satellites into low-Earth-orbit had always been a good way to wind up in bankruptcy. To him, the mission of Starlink was simple. 

“We want to be in the not-bankrupt category,” Musk said in 2020. “That’s our goal.” 

Created with Highcharts 9.0.1Starlink total subscribers by yearSource: the companyNote: 2026 data through March 31
Created with Highcharts 9.0.12023'24'25'26024681012 million subscribers

To achieve this audacious goal, Musk’s top engineers had to make satellites faster and cheaper—so they applied “The Algorithm.” 

In the IPO paperwork, the company formally defines “The Algorithm” as a five-step process with the guiding principles of SpaceX: make less dumb, delete, optimize, accelerate, automate. 

When he found out that Starlink satellites were being released individually, for example, Musk wondered why they couldn’t be released at once. “I was too chicken to propose that,” said SpaceX rocket engineer Mark Juncosa, according to Isaacson’s 2023 book. “Elon made us try it.” And it worked. SpaceX says it has reduced the average manufacturing cost of a Starlink kit by 59% since 2022.

The rest of the business has been through its own dramatic transformation in that time. 

Starlink had about 2 million subscribers back in 2023. Now there are more than 10 million. 

In the early days, Musk dreamed of a network with 4,000 satellites, which was more than the total number of satellites in known existence. Now it has almost 10,000. 

It took roughly five years of development before Starlink launched its first satellites and many of them failed. Now they’re launching every three days and they always work. 

All of which added up to one of the many eye-popping disclosures in the company’s prospectus. SpaceX has plans to discover “trillion-dollar markets on the Moon, Mars and beyond,” it said. But it didn’t have to look that far to find the first one. 

“We founded Starlink,” the company bragged. 

Starlink has already carried SpaceX farther than even Musk predicted. Now that it’s landing on Wall Street, there’s only another 140 million miles to Mars.

A time exposure of SpaceX's Falcon 9 rocket as it launched the first Starlink satellites.A time exposure of SpaceX's Falcon 9 rocket as it launched the first Starlink satellites. Joe Marino-Bill Cantrell/UPI/Alamy

Copyright ©2026 Dow Jones & Company, Inc. All Rights Reserved. 87990cbe856818d5eddac44c7b1cdeb8

Ben Cohen writes the Science of Success column for The Wall Street Journal. In his column, Ben reports across a wide variety of topics in business, tech and culture, from the world's most valuable companies to people you've never heard of. His work has won Feature Writing prizes from the New York Press Club and a Best in Business award from the Society for Advancing Business Editing and Writing. Ben is also a regular contributor to WSJ. Magazine.

Before founding his column in 2022, Ben was a sports reporter at the Journal for more than a decade. He specialized in the NBA, focusing on strategies, oddities, the 3-point revolution, LeBron James and Stephen Curry. He also wrote about college football and has covered almost every sport, including five Olympics.

Ben's first book, "The Hot Hand," was an investigation into the mystery, science, magic, fascinating psychology and real-world consequences of streaks. Andre Agassi called it "a feast for anyone interested in the secrets of excellence." Ben is now working on his next book, which is based on his Science of Success columns.

He joined the Journal in 2010 as an intern after graduating from Duke University and lives in New York with his family.

Read the whole story
bogorad
1 day ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete

I Hiked a Mountain Wearing $2,000 Robotic Legs. It Was a Walk in the Park. - WSJ

1 Share

LLM (google/gemini-3.1-flash-lite-20260507) summary:

  • Device Overview: the hypershell x ultra s is a 1,999 dollar hip-mounted motorized exoskeleton intended for hikers and cyclists to simulate superhuman physical ability.
  • Mechanical Operation: the apparatus utilizes twin motors and carbon-fiber arms to deliver up to 1,000 watts of torque directly to the users legs.
  • Software Integration: internal ai identifies movement patterns to calibrate force, though it notably fails to recognize downhill terrain without manual intervention via a smartphone app.
  • Physical Sensation: the user experience is described as being akin to a puppet under the control of machinery which creates jerky movements during non-repetitive motion.
  • Safety Concerns: the hardware possesses enough force to snap back violently when improperly handled posing a tangible risk of injury to the wearer.
  • Dependency Issues: reliance on the device leads to a sense of heaviness and sluggishness once removed as the users natural muscles attempt to re-adapt to unassisted movement.
  • Practical Limitations: battery life is quickly depleted by high-intensity settings leaving the user to manually carry five pounds of inactive robotic weight.
  • Market Viability: the product remains a niche toy for wealthy hobbyists rather than a perfected tool, often leading to secondary muscle soreness from overexertion while pretending to have infinite endurance.

Hypershell’s new X Ultra S motorized exoskeleton is aimed at outdoor enthusiasts who want to cover more distance with less effort.
Nicole Nguyen

By

Nicole Nguyen

| Photography and Video by Poppy Lynch

May 20, 2026 9:00 pm ET

BPC > Only use to renew if text is incomplete or updated: | archive.li
BPC > Full article text fetched from (no need to report issue for external site): | archive.today | archive.vn

  • A columnist tested Hypershell’s X Ultra S, a $1,999 AI-powered bionic leg booster for hiking and cycling.
  • The hip-based exoskeleton uses AI software to interpret movements, providing mechanical force that aids uphill climbs and walking in sand.
  • The device, which can feel like being under the control of a puppeteer, requires manual downhill mode activation and poses risks if handled improperly.
This summary was generated with AI and reviewed by an editor. Read more about how we use artificial intelligence in our journalism.
  • A columnist tested Hypershell’s X Ultra S, a $1,999 AI-powered bionic leg booster for hiking and cycling.
    View more
AI is entering the physical realm in a big way. Case in point: I spent Mother’s Day e-hiking with bionic leg boosters.
I strapped my two-year-old in a hiking carrier, and went for a walk up a winding dirt trail. The robotic hip motors whirred and a mechanical force tugged at my quads.
As the incline got steeper, I opened the companion app on my phone and pressed “Boost.”
 My stride quickened. The whirring got louder. As the AI marched me toward the summit, I enjoyed the view. Thirty seconds later, the surge of power was over, and the legs returned to the gentler Eco mode.
These motorized supports were once reserved for military, heavy industry and mobility rehabilitation. Now, they are light and affordable enough for regular folks—regular folks who want to feel superhuman, that is. These hip-based systems start at $900, and I tried the latest, Hypershell’s $1,999 X Ultra S.
Two weeks of testing didn’t transform me into Iron Man. But the rig did make me want to sprint everywhere—and I hate running.

A bionic puppet master

Like its competitors, Hypershell’s exoskeleton consists of a waist band and a pair of hinged thigh braces. Twin hip motors draw up to 1,000 watts to power the carbon-fiber arms that apply force to your leg. Theoretically, at its top supported speed, the X Ultra S could help you run an elite four-minute mile.
Before you read on, watch me using it in this video:
See how the Hypershell X Ultra S works. Photo: Poppy Lynch for WSJ
Long nature walks are more my speed, so I took the bionic legs on San Francisco’s hilliest trails. Passersby stared at the unsubtle contraption. If I looked like a dork, at least I could zoom quickly past the judgment.
The first time I wore them, the contraption felt like a puppeteer controlling my legs. The motors can jerk you around, especially if you start, stop or change direction suddenly. Once you get into a constant, repetitive motion, the push-and-pull sensation fades.
I used the mobile app to calibrate the power—25% on Eco mode was just right to start.
AI software the company calls Hyperintuition interprets your movements to deliver the right force. Going uphill, with the torque notched up, I really felt the exoskeleton at work. Climbing up stairs was like walking up an escalator. The e-assistance really shines on sand, where you normally feel your energy sapped away.
But as I crossed a rocky section with some loose boulders, I worried one wrong jerk could send me tumbling, so I dialed down the power. A Hypershell spokesman said that a snug, proper fit and a lower level of assistance can help on unstable terrain.
I was disappointed with my downhill experience. Descents can be exhausting but the AI isn’t smart enough to detect them. You have to dig into the app, and activate Downhill mode yourself.
A couple of power-walking hours later, I removed the legs. My body felt slow and heavy. I was like an astronaut returning to Earth, getting reacquainted with my own muscles.

The cyborg cyclist

On a bike ride, the puppet effect was even more dramatic. At one point I was barely moving my legs myself. On certain climbs, I topped out panting and exhausted, then realized my bike was in its hardest gear.
The Spandex-clad cyclists who tackle San Francisco’s iconic Hawk Hill have a saying: “The climb doesn’t get easier, you just get faster.”
That’s also true of riding with battery-boosted legs. I was out of breath because I was pedaling at my regular cadence, but each stroke had a lot more power.
Hypershell’s app lets you set the exoskeleton’s assistance intensity.
The exoskeleton’s AI-enabled software can detect movement and deliver the appropriate amount of power—more for running, less for walking.
Hypershell’s companion app, left, lets you set the assistance intensity of the exoskeleton, which can output a max of 1,000 watts of power.
The device is mostly intuitive but comes with some dangers. After my ride, I unstrapped the unit thinking it was off, but an active arm snapped back with full force. I wasn’t hurt, but it was a stark reminder of the risks of robotics in our everyday lives. Hypershell’s app includes reminders for responsible use, including the proper way to disengage.
Minutes after I returned home, the battery died. I was relieved I wasn’t stranded on a mountain, hauling 5 pounds of dead robot weight. I think in my testing I overtaxed the battery by relying on the Boost too much. The company says you can generally walk 18 miles in the X Ultra S, twice that with the included extra battery pack.

Who needs magic legs?

I’m not sure exoskeletons are ready for prime time, though early-adopter gearhead types with a couple thousand dollars to burn will have fun with them. I do look forward to taking them backcountry skiing come winter. Those long uphill climbs (not to mention keeping up with my much fitter husband) can be grueling.

Many of us could explore motorized legs as they become lighter, cheaper and more discreet: especially people who are just getting active, or older folks who want support while hiking. And there’s a case for serious athletes, who could use the Fitness setting to actually add resistance during workouts.
One side effect: My calves were sore for days. The exoskeleton made me feel like I had infinite endurance so I kept going, and my other muscles paid the price. Fortunately, another company makes a bionic system for the lower leg. Maybe I’ll wear them together for my next review—and let the bots do all the work.

Copyright ©2026 Dow Jones & Company, Inc. All Rights Reserved. 87990cbe856818d5eddac44c7b1cdeb8

Already a subscriber? Sign In

Read the whole story
bogorad
2 days ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete

Project Glasswing: what Mythos showed us

1 Share

LLM (google/gemini-3.1-flash-lite-20260507) summary:

  • Technological Claims: the model is marketed as a superior tool for linking disparate system vulnerabilities to develop sophisticated attack chains.
  • Automated Verification: the software attempts to validate its own findings by writing and executing code, though this process is fundamentally probabilistic and prone to inconsistency.
  • Inconsistent Guardrails: the lack of standardized safety protocols results in unpredictable behavior where identical security requests are handled differently based on contextual framing.
  • Signal Noise: reliance on language models increases the volume of speculative, low-quality findings which shift the burden of verification onto human workers.
  • Architectural Limitations: simple agents fail to perform thorough security analysis because their narrow window of awareness and single-threaded operations prevent comprehensive system coverage.
  • Bureaucratic Complexity: addressing systemic flaws requires the construction of an elaborate, multi-stage management harness to forcefully constrain and direct the underlying model.
  • Resource Intensivity: the process prioritizes high-compute, parallelized agent workflows as an expensive remedy for the inherent unreliability of the AI components.
  • Operational Naivety: the promise of accelerated patch cycles ignores the reality of broken software deployments and the inevitable failure points created by rushing updates through automated pipelines.

For the last few months, we've been testing a range of security-focused LLMs on our own infrastructure. These LLMs help identify potential vulnerabilities in our own systems, so we can fix them – and they also show us what attackers are going to be able to do with the latest models.

None of these LLMs has captured more attention than Mythos Preview, from Anthropic. A few weeks ago, we were invited to use Mythos Preview as part of Project Glasswing. We soon pointed it at more than fifty of our own repositories – to see what it would find, and to see how it works.

This post shares what we observed, what the models did well and what they didn't, and how the architecture and process around them needs to change, so they can be used at scale.

What changed with Mythos Preview

Mythos Preview is a real step forward, and it's worth saying that plainly before getting into anything else. We've been running models against our code for a while now, and the jump from what was possible with previous general-purpose frontier models to what Mythos Preview does today is not just a refinement of what came before.

It's a different kind of tool doing a different kind of work, and that makes a clean apples-to-apples comparison to earlier models difficult. So rather than trying to benchmark Mythos Preview against general-purpose frontier models, it's more useful to describe what it can actually do, and two features that stood out across the work we did with Mythos Preview:

  • Exploit chain construction - A real attack rarely uses one bug. It chains several small attack primitives together into a working exploit. For instance, it might turn a use-after-free bug into an arbitrary read and write primitive, hijack the control flow, and use return-oriented programming (ROP) chains to take full control over a system. Mythos Preview can take several of these primitives and reason about how to combine them into a working proof. The reasoning it shows along the way looks like the work of a senior researcher rather than the output of an automated scanner.

  • Proof generation - Finding a bug and proving it's exploitable are two different things, and Mythos Preview can do both. It writes code that would trigger the suspected bug, compiles that code in a scratch environment, and runs it. If the program does what the model expected, that's the proof. If it doesn't, the model reads the failure, adjusts its hypothesis, and tries again. The loop matters as much as the bugs it finds, because a suspected flaw without a working proof is speculation, and Mythos Preview closes that gap on its own.

Some of what we describe above is not entirely unique to Mythos Preview. When we ran other frontier models through the same harness, they found a fair number of the same underlying bugs, and in some cases they got further than we expected on the reasoning side too. Where they fell short was at the point of stitching the pieces together. A model would identify an interesting bug, write a thoughtful description of why it mattered, and then stop, leaving the actual chain unfinished and the question of exploitability open. What changed with Mythos Preview is that a model can now take those low-severity bugs (which would traditionally sit invisible in a backlog) and chain them into a single, more severe exploit. 

Model refusals in legitimate vulnerability research

The Mythos Preview model provided by Anthropic, as part of Project Glasswing, did not have the additional safeguards that are present in generally available models (like Opus 4.7 or GPT-5.5).

Despite this, the model organically pushes back on certain requests - much like the cyber capabilities that made it useful for vulnerability hunting, the model has its own emergent guardrails that sometimes cause it to push back on legitimate security research requests. But as we found, these organic refusals aren’t consistent - the same task, framed differently or presented in a different context, could produce completely different outcomes as illustrated in the examples below.

Example of Mythos Preview pushing back on building a working proof of concept 

For example, the model initially refused to do vulnerability research on a project, then agreed to perform the same research on the same code after an unrelated change to the project’s environment. Nothing about the code being analyzed had changed. In another case, the model found and confirmed several serious memory bugs in a codebase, and then refused to write a demonstration exploit. The same request, framed differently, got a different answer, and even the same request can produce different outcomes across runs due to the probabilistic nature of the model. Semantically equivalent tasks can produce opposite outcomes depending on how and when they’re presented to the model.

This matters because while the model’s organic refusals/guardrails are real, they aren’t consistent enough to serve as a complete safety boundary on their own. That’s precisely why any capable cyber frontier model made generally available in the future must include additional safeguards on top of this baseline behavior - making it appropriate for broader use outside of a controlled research context like Project Glasswing.

The signal-to-noise problem

One of the hardest parts of triaging security vulnerabilities is deciding which bugs are real, which are exploitable, and which need fixing now. This was a hard problem even in the pre-AI world. AI vulnerability scanners and AI-generated code have made it worse, and at Cloudflare we've built multiple post-validation stages to deal with it.

Two factors dominate the noise rate:

  • Programming language - C and C++ give you direct memory control and, with it, bug classes - buffer overflows, out-of-bounds reads and writes - that memory-safe languages like Rust eliminate at compile time. We saw consistently more false positives from projects written in memory-unsafe languages.

  • Model bias - A good human researcher tells you what they found and how confident they are. Models don't. Ask a model to find bugs, and it will find them, whether the code has any or not. Findings come back hedged with "possibly," "potentially," "could in theory," and the hedged findings vastly outnumber the solid ones. That's a reasonable bias for an exploratory tool. It's a ruinous one for a triage queue, where every speculative finding spends human attention and tokens to dismiss, and that cost compounds across thousands of findings.

Mythos Preview represents a clear improvement here, particularly in its ability to chain primitives - combining multiple vulnerabilities into a working proof of concept rather than reporting them in isolation. A finding that arrives with a PoC is a finding you can act on, and it means far less time spent asking "is this even real?"

Our harnesses are deliberately tuned to over-report, so we see more (and miss less), which comes with a lot more noise. But at triage time, Mythos Preview's output has noticeably higher quality: fewer hedged findings, clearer reproduction steps, and less work to reach a fix-or-dismiss decision.

Why pointing a generic coding agent at a repo doesn't work

When we first started AI-assisted vulnerability research last year, our instinct was the obvious one: point a generic coding agent at an arbitrary repository and ask it to discover vulnerabilities. This approach works, in the sense that the model will produce findings, but it doesn't work in producing meaningful coverage of a real codebase and identifying findings of value. There are two main reasons for this:

  • Context - Coding agents are tuned for one focused stream of work: building a feature, fixing a bug, writing a refactor. They ingest a lot of source code, hold a single hypothesis at a time, and iterate against it. That's exactly the wrong shape for vulnerability research, which is narrow and parallel by nature. A human researcher picks one specific thing to look at and investigates it thoroughly. That one thing might be a single complex feature, transitions across security boundaries, or a specific vulnerability class like command injections, where attacker input ends up being run as a shell command. Then they do it again, for a different feature, security boundary, or vulnerability class, several thousand times across the codebase. A single agent session (even with subagents) against a hundred-thousand-line repository can cover maybe a tenth of a percent of the surface in a useful way before the model's context window fills up and compaction kicks in - potentially discarding earlier findings that would have mattered.

  • Throughput - A single-stream agent does one thing at a time, but real codebases need many hypotheses against many components at once, with the ability to fan out further when something interesting turns up. You can drive a single agent harder, but at some point you stop being limited by the model and start being limited by the shape of the interaction itself. Using the model directly in a coding agent turns out to be fine for manual investigation when a researcher already has a lead and wants a second pair of eyes. However, it's the wrong tool for achieving high coverage. Once we accepted that, we stopped trying to make Mythos Preview do the wrong job and started building the harness around it instead.

What a harness actually fixes

Four lessons came out of running the work at scale, and each one pointed to the need for a harness that manages the overall execution:

  • Narrow scope produces better findings - Telling the model "Find vulnerabilities in this repository" makes it wander. Telling it "Look for command injection in this specific function, with this trust boundary above it, here's the architecture document and here's prior coverage of this area" makes it do something much closer to what a researcher would actually do.

  • Adversarial review reduces noise - Adding a second agent between the initial finding and the queue - one with a different prompt, a different model, and no ability to generate its own findings - catches a lot of the noise that the first agent would miss if it just checked its own work. It turns out that putting two agents in deliberate disagreement is way more effective than just telling one agent to be careful.

  • Splitting the chain across agents produces better reasoning - Asking "Is this code buggy?" and "Can an attacker actually reach this bug from outside the system?" are two different questions, and the model is better at each one when you ask them separately, because each question is narrower than the combined version.

  • Parallel narrow tasks beat one exhaustive agent - Coverage improves when many agents work on tightly scoped questions and we deduplicate the results afterward, rather than asking one agent to be exhaustive.

Each of those observations is about model behavior, and put together they describe something that isn't a chat interface anymore. It's a harness that helps you achieve the final outcomes. The first steps to building a harness are simple, as you can ask the model to help, which is what we did. We used Mythos Preview to build on, tailor, and improve our original harnesses to suit its strengths. An example of what a harness looks like in practice is described below.

Our vulnerability discovery harness

Here's what our vulnerability discovery harness looks like, stage by stage. It was used to scan live code across our runtime, edge data path, protocol stack, control plane, and the open-source projects we depend on.

Stage What it does Why it matters

Recon
An agent reads the repository from the top down, fans out to subagents responsible for each subsystem, and produces an architecture document covering build commands, trust boundaries, entry points, and likely attack surface. It also generates the initial queue of tasks for the next stage.   Gives every downstream agent shared context. Cuts the wander problem.
 
Hunt
Each task is one attack class paired with a scope hint. Hunters (the agents that actually look for bugs) run concurrently, typically around fifty at once, each fanning out to a handful of exploration subagents. Each hunter has access to tools that compile and run proof-of-concept code in a per-task scratch directory. This is where most of the work happens. Many narrow tasks in parallel, not one exhaustive agent.

Validate
An independent agent re-reads the code and tries to disprove the original finding. It uses a different prompt and has no ability to emit new findings of its own. Catches a meaningful fraction of the noise the hunter wouldn't catch when reviewing its own work.

Gapfill
Hunters flag areas they touched but didn't cover thoroughly. Those areas get re-queued for another pass. Counteracts the model's tendency to drift toward attack classes it has already had success with.

Dedupe
Findings that share the same root cause collapse into a single record. Variant analysis is a feature, not a way to inflate the queue with duplicates.

Trace
For each confirmed finding in a shared library, a tracer agent fans out (one instance per consumer repository), uses a cross-repo symbol index, and decides whether attacker-controlled input actually reaches the bug from outside the system. Turns "there is a flaw" into "there is a reachable vulnerability." This is the stage that matters most.

Feedback
Reachable traces become new hunt tasks in the consumer repositories where the bug is actually exposed. Closes the loop. The pipeline gets better as it runs.

Report
An agent writes a structured report against a predefined schema, fixes any validation errors against that schema itself, and submits the report to an ingest API. Output is queryable data, not free-form prose.

What this means for security teams

The loudest reaction to Mythos Preview from other security leaders has been about speed - scan faster, patch faster, compress the response cycle. More than one team we have spoken with is now operating under a two-hour SLA from CVE release to patch in production. The instinct is understandable: when the attacker timeline shortens, the defender timeline has to shorten with it. Faster is not going to be enough, and we think a lot of teams are about to spend a lot of time, effort, and money learning that the hard way.

Patching faster does not change the shape of the pipeline that produces the patch. If regression testing takes a day, you cannot get to a two-hour SLA without skipping it, and the bugs you ship when you skip regression testing tend to be worse than the bugs you were trying to patch. We learned a version of this when we tried letting the model write its own patches and watched a few go out that fixed the original bug while quietly breaking something else the code depended on.

The harder question is what the architecture around the vulnerability should look like. The principle is to make exploitation harder for an attacker even when a bug exists, so that the gap between when a vulnerability is disclosed and when it is patched matters less. That means defenses that sit in front of the application and block the bug from being reached. It means designing the application so that a flaw in one part of the code cannot give an attacker access to other parts. It means being able to roll out a fix to every place the code is running at the same moment, rather than waiting on individual teams to deploy it. 

We also recognize this topic cuts both ways. The same capabilities that helped us find bugs in our own code will, in the wrong hands, accelerate the attack side against every application on the Internet. Cloudflare sits in front of millions of those applications, and the architectural principles described above are exactly the ones our products are built to apply on behalf of customers. We will share more on what that means for customers in the weeks ahead.

If your team is doing similar work and would like to compare notes, reach out to us at security-ai-research@cloudflare.com.

Our research with Mythos Preview was conducted in a controlled environment against our own code; every vulnerability surfaced through this work was triaged, validated, and remediated where action was needed under Cloudflare's formal vulnerability management process.

This work was a team effort. Thanks to Albert Pedersen, Craig Strubhart, Dan Jones, Irtefa Fairuz, Martin Schwarzl, and Rohit Chenna Reddy for their contributions to the research, engineering, and analysis behind this blog post.

Read the whole story
bogorad
5 days ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete
Next Page of Stories