Strategic Initiatives
12318 stories
·
45 followers

AddyOsmani.com - Don't Outsource the Learning

1 Share

LLM (google/gemini-3.1-flash-lite-20260507) summary:

  • Automation Risks: reliance on artificial intelligence for coding can lead to a erosion of personal technical comprehension.
  • Skill Degradation: passive task completion using generative tools often results in lower retention and reduced critical thinking capabilities.
  • Research Evidence: multiple scientific studies demonstrate that heavy reliance on external assistance diminishes human cognitive engagement during complex work.
  • Workflow Defaults: current software tools prioritize rapid output over pedagogical engagement which discourages deep understanding.
  • Strategic Delegation: tasks involving routine boilerplate code may be delegated whereas structural architectural knowledge remains essential for long term effectiveness.
  • Labor Market Impacts: software developers unable to function without ai support face significant risks regarding professional relevance and employability.
  • Active Learning: integrating intentional methodologies such as socratic questioning or self initiated problem testing can mitigate cognitive decline.
  • Dual Objectives: maintaining professional growth requires balancing immediate productivity targets with independent individual learning milestones.

Right now, it’s too easy to let AI write the code while you skip the learning. The bug gets fixed. Your mental model doesn’t move. We are silently trading future capability for present-day speed, and the tools won’t force us to do otherwise. That part has to come from you.


There’s a default loop most of us have settled into. You paste in a spec or error message. The model hands you a fix. The symptom vanishes. You ship. Somewhere in that loop, the messy struggle between problem and solution stops happening at all.

I’ve written before about cognitive surrender, the moment an AI reviewer’s verdict quietly replaces your own. This is the solo version of that same loop. It’s just you and the model. The model is faster, so you stop trying to compete on comprehension. Across thousands of these small interactions, what you can actually build without an AI looking over your shoulder gets a little weaker every week. None of these moments feel like a problem on the day they happen.

I’m not anti-AI. I use these tools daily and have shipped more with them in the last year than in the five years before it. But the default way we use them is optimized for one thing: closing tasks. That is a completely different goal from staying sharp enough to steer them over a career that spans decades.


The studies are converging on the same point

Several pieces of research over the last year have landed in roughly the same place.

Anthropic ran a randomized trial in early 2026 where engineers learned a new Python library, half with AI assistance and half without. Both groups finished the tasks at the same speed. But the AI group bombed the follow-up comprehension quiz: 50% versus 67% for the manual group, with the gap widening on debugging. The interesting cut was inside the AI group itself. Engineers who used AI to ask conceptual questions scored above 65%. Engineers who copy-pasted the generated code scored under 40%. The tool didn’t determine the outcome. The posture did.

MIT’s Your Brain on ChatGPT study compared essay writing across LLM, search-engine, and brain-only groups. EEG measurements showed brain connectivity scaling down with every layer of external support. The LLM group showed the weakest coupling. After writing the essay, 83% of LLM users couldn’t quote a single line of what they had just produced. The researchers called this cognitive debt: saving mental effort today, paying for it in critical thinking tomorrow.

A CHI 2026 study added a related finding. When people had LLM access at the start of a task, the LLM framed the entire problem. Even when the human did the rest of the work themselves, that initial anchoring produced measurably worse decisions. The order of operations mattered more than the total amount of AI used.

Different methodologies, same conclusion. Using AI without an active intent to learn quietly degrades the skill you’re being paid for.


The tools default to shipping, not teaching

If you fire up a coding agent and stick to the defaults, everything is tuned for one metric: getting the task done. The model writes the code. You accept it. The loop repeats. At no point does the tool pause and ask “what do you think the problem is?” or “try writing the first five lines yourself.”

That isn’t a conspiracy. It’s UX gravity. Product teams get rewarded for merged changes and shorter cycle times, not for making you a sharper engineer. We all want fewer keystrokes, so the tools have sanded the friction away. The trouble is that friction was where the learning lived.

A few companies have started pushing back. Anthropic shipped Learning Mode for Claude, which uses Socratic questioning and stops to ask you to write code before continuing. OpenAI and Google have shipped similar features. Almost nobody uses them for real production work. We’ve quietly filed them under “for students” and that’s a mistake. The same feature that helps a sophomore learn React works for a senior engineer learning Rust. You just have to be willing to feel like a beginner again.


“If the AI can do it, why do I need to understand it?”

A fair question. For some work, the answer is: you don’t. If it’s boilerplate, glue code, or a throwaway CI script you’ll never look at again, delegate it. The opportunity cost of memorizing YAML syntax is too high.

For real software, pure delegation breaks down in a few specific places.

When something breaks. AI-generated code crashes the same way human code does. “The agent wrote it” doesn’t help you debug problems. Somebody on the team has to understand the architecture.

When it’s confidently wrong. LLMs hallucinate. The only defense against a plausible-looking incorrect answer is enough expertise to spot it.

When the foundation changes. Code is temporary; systems are permanent. When frameworks update or a security review flags a structural issue, you can’t re-prompt your way out. You need engineers who understand the system well enough to migrate it.

When you leave the median. AI is brilliant at problems that have been solved a million times on GitHub. The further you stray from the median, the worse it gets. The hard, undocumented problems, the ones that justify a senior engineer’s salary, still require deep understanding.

When the market adjusts. That 20% drop in junior developer employment since 2022 isn’t a fluke. Engineers who can only ship with AI, and not without it, are entering a labor pool that is already re-pricing what expertise is worth.

If you use AI to skip learning, you’re trading future relevance for a slightly easier Tuesday.


The fix is in how you prompt, not whether you do

The good news is that the same tools that produce cognitive debt can produce sharper engineers. The difference is in what you ask of them.

Form a hypothesis before you ask. Before requesting a fix, write down two or three sentences on what you think the problem is. Use the model’s answer to test your theory, not to replace it.

Ask for the explanation before the code. In unfamiliar territory, your first prompt should be something like “explain how this works, what the alternatives are, and what the tradeoffs are.” Ask for the code only after you’ve grasped the concepts.

Turn on Learning Mode when you’re out of your depth. Claude has it. ChatGPT has Study Mode. Gemini has Guided Learning. Yes, it feels slower. That’s the point.

Treat AI output like a PR from a junior engineer. Read it. Critique it. Push back on it. Would you merge it just because the tests passed? If not, don’t merge it here either.

Re-derive things by hand once in a while. Take a piece of code the model wrote for you and try to recreate it from scratch. It’s the calibration check that tells you how much you’ve quietly lost.

Ask the model to teach you what it just did. After it writes a clever function, ask what concepts it used and what you’d need to read to understand the design choice. One extra prompt changes what you take away from the session.

None of these are dramatic. They’re small posture shifts inside the same tools you’re already using.


Two metrics, not one

I’ve started ending coding sessions with a simple question: did I learn anything today, or did I just close tickets?

Sometimes the honest answer is “I just closed issues” and that’s fine. If it becomes the answer for months in a row, cognitive debt is accumulating in the background.

Ship and learn are two separate metrics. Your manager and your customers will only ever ask about the first one. The second is on you.

I’d rather ship 80% of what I could have and learn 100% of what I needed to, than the reverse. Over years, those two strategies produce very different engineers.

You don’t have to choose between using AI and learning. You do have to choose a workflow that does both, because the defaults won’t choose it for you. The tools are ready whenever you are. The next boring task you were about to delegate is a good place to start.


Further reading: Anthropic’s skill-formation study, MIT’s Your Brain on ChatGPT (arXiv 2506.08872), the CHI 2026 paper on LLM use under time constraints, Stack Overflow’s AI vs Gen Z report, and my earlier posts on comprehension debt and cognitive surrender.

Read the whole story
bogorad
4 hours ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete

SpaceX’s Ambitions Are Intergalactic. Its Business Is Selling You Internet. - WSJ

1 Share

LLM (google/gemini-3.1-flash-lite-20260507) summary:

  • Grandiose Mission Statement: the company claims its primary objective is to make life multiplanetary and preserve human consciousness beyond earth.
  • Financial Reality Check: despite utopian rhetoric, the venture relies on generating massive capital from mundane internet services to fund its expensive experiments.
  • Revenue Foundation: starlink connectivity serves as the primary cash cow, contributing over sixty percent of total company sales and providing the only profitable segment.
  • Corporate Flywheel Model: internal operations function by utilizing rockets to deploy satellites which in turn generate the revenue necessary to launch more rockets.
  • Regulatory Filing Bravado: the ipo documentation features eccentric claims including a twenty-nine trillion dollar addressable market and existential warnings about dinosaur extinction.
  • Optimized Manufacturing Process: the company employs a five-step philosophy called the algorithm to aggressively cut costs and automate the production of satellite hardware.
  • Rapid Scaling Efforts: the satellite network has expanded from an initial projection of four thousand units to nearly ten thousand currently in orbit.
  • Market Expansion Goals: management aims to leverage its terrestrial internet success to dominate theoretical trillion-dollar markets on the moon and mars.

SpaceX launched the first Starlink satellites in 2019.SpaceX launched the first Starlink satellites in 2019. Joe Marino-Bill Cantrell/UPI/Alamy
Ben Cohen

May 21, 2026 9:00 pm ET

The mission of SpaceX is to “make life multiplanetary, to understand the true nature of the universe and to extend the light of consciousness to the stars.” 

Starlink was born of more earthly concerns. 

“What’s needed to create a city on Mars? Well, one thing’s for sure: a lot of money,” Elon Musk said in 2015 at the launch event for his latest moonshot. “So we need things that will generate a lot of money.” 

Musk laid out his radical vision of the future long before SpaceX had its sights on the biggest IPO in history. Even then, he knew that rockets alone wouldn’t get his company to Mars. His only hope of establishing society on a freezing rock far, far away was blasting thousands of satellites into orbit.

A decade later, Starlink is now generating enough money to bankroll a company that burns through it at warp speed.

Created with Highcharts 9.0.1SpaceX annual revenue by divisionSource: the companyNote: Starlink is part of the Connectivity division
Created with Highcharts 9.0.12023'24'25024681012141618$20 billionAIStarlinkSpace

SpaceX may be known for building majestic rockets, firing those giant beasts of marvelous engineering into the skies and catching them with chopsticks. What’s less known is that the company’s extraordinary ambitions are fueled by an incredibly ordinary product: SpaceX has become an internet-service provider that also explores space.  

The prospectus that it filed as SpaceX prepares for a massive initial public offering makes it clear that colonizing Mars and cracking AI is going to depend on selling Wi-Fi. 

SpaceX consists of three segments: space, AI and connectivity, which is primarily driven by Starlink. Last year, the Starlink division was responsible for $11 billion of revenue, which amounted to more than 60% of the company’s total sales. It was the most valuable part of the business—and the only profitable one. And for years, it has been absolutely essential to the success of SpaceX. As it turns out, even companies that defy the laws of gravity are bound by the laws of economics. 

The mysterious finances of Musk’s company were detailed this week in SpaceX’s IPO filing, which is far more bonkers than financial paperwork has any right to be.

The highlights include the company describing itself as “the most ambitious, vertically integrated innovation engine on (and off) Earth,” claiming a total addressable market of $29 trillion, revealing that Musk’s pay package is tied to “the establishment of a permanent human colony on Mars with at least one million inhabitants” and declaring: “We do not want humans to have the same fate as dinosaurs.” 

But when it’s not discussing existential perils of the universe, the document also happens to explain the business model of SpaceX. 

It shows that the whole company is built on a powerful flywheel: SpaceX rockets launch Starlink satellites, and those Starlink satellites are the reason SpaceX can launch more rockets.  

That virtuous cycle was Musk’s vision from the earliest days of Starlink, as he told a room full of engineers that night more than a decade ago. 

Before it provided a lifeline in dead zones, before it restored communications after natural disasters, before it beamed Wi-Fi into the middle of nowhere and metal tubes at 35,000 feet, Starlink was the solution to SpaceX’s money problems. 

At the time, SpaceX was basically a trucking business that charged governments and private companies to haul stuff into orbit. But that wasn’t going to pay for civilization on another planet, so Musk went exploring for other spaces. There was nothing glamorous about selling internet access. The market was so large, though, that grabbing even a small percentage of it would produce revenue that exceeded NASA’s entire budget, Musk told biographer Walter Isaacson. 

But if it were easy to do, others would have done it. In fact, getting into the business of shooting satellites into low-Earth-orbit had always been a good way to wind up in bankruptcy. To him, the mission of Starlink was simple. 

“We want to be in the not-bankrupt category,” Musk said in 2020. “That’s our goal.” 

Created with Highcharts 9.0.1Starlink total subscribers by yearSource: the companyNote: 2026 data through March 31
Created with Highcharts 9.0.12023'24'25'26024681012 million subscribers

To achieve this audacious goal, Musk’s top engineers had to make satellites faster and cheaper—so they applied “The Algorithm.” 

In the IPO paperwork, the company formally defines “The Algorithm” as a five-step process with the guiding principles of SpaceX: make less dumb, delete, optimize, accelerate, automate. 

When he found out that Starlink satellites were being released individually, for example, Musk wondered why they couldn’t be released at once. “I was too chicken to propose that,” said SpaceX rocket engineer Mark Juncosa, according to Isaacson’s 2023 book. “Elon made us try it.” And it worked. SpaceX says it has reduced the average manufacturing cost of a Starlink kit by 59% since 2022.

The rest of the business has been through its own dramatic transformation in that time. 

Starlink had about 2 million subscribers back in 2023. Now there are more than 10 million. 

In the early days, Musk dreamed of a network with 4,000 satellites, which was more than the total number of satellites in known existence. Now it has almost 10,000. 

It took roughly five years of development before Starlink launched its first satellites and many of them failed. Now they’re launching every three days and they always work. 

All of which added up to one of the many eye-popping disclosures in the company’s prospectus. SpaceX has plans to discover “trillion-dollar markets on the Moon, Mars and beyond,” it said. But it didn’t have to look that far to find the first one. 

“We founded Starlink,” the company bragged. 

Starlink has already carried SpaceX farther than even Musk predicted. Now that it’s landing on Wall Street, there’s only another 140 million miles to Mars.

A time exposure of SpaceX's Falcon 9 rocket as it launched the first Starlink satellites.A time exposure of SpaceX's Falcon 9 rocket as it launched the first Starlink satellites. Joe Marino-Bill Cantrell/UPI/Alamy

Copyright ©2026 Dow Jones & Company, Inc. All Rights Reserved. 87990cbe856818d5eddac44c7b1cdeb8

Ben Cohen writes the Science of Success column for The Wall Street Journal. In his column, Ben reports across a wide variety of topics in business, tech and culture, from the world's most valuable companies to people you've never heard of. His work has won Feature Writing prizes from the New York Press Club and a Best in Business award from the Society for Advancing Business Editing and Writing. Ben is also a regular contributor to WSJ. Magazine.

Before founding his column in 2022, Ben was a sports reporter at the Journal for more than a decade. He specialized in the NBA, focusing on strategies, oddities, the 3-point revolution, LeBron James and Stephen Curry. He also wrote about college football and has covered almost every sport, including five Olympics.

Ben's first book, "The Hot Hand," was an investigation into the mystery, science, magic, fascinating psychology and real-world consequences of streaks. Andre Agassi called it "a feast for anyone interested in the secrets of excellence." Ben is now working on his next book, which is based on his Science of Success columns.

He joined the Journal in 2010 as an intern after graduating from Duke University and lives in New York with his family.

Read the whole story
bogorad
16 hours ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete

I Hiked a Mountain Wearing $2,000 Robotic Legs. It Was a Walk in the Park. - WSJ

1 Share

LLM (google/gemini-3.1-flash-lite-20260507) summary:

  • Device Overview: the hypershell x ultra s is a 1,999 dollar hip-mounted motorized exoskeleton intended for hikers and cyclists to simulate superhuman physical ability.
  • Mechanical Operation: the apparatus utilizes twin motors and carbon-fiber arms to deliver up to 1,000 watts of torque directly to the users legs.
  • Software Integration: internal ai identifies movement patterns to calibrate force, though it notably fails to recognize downhill terrain without manual intervention via a smartphone app.
  • Physical Sensation: the user experience is described as being akin to a puppet under the control of machinery which creates jerky movements during non-repetitive motion.
  • Safety Concerns: the hardware possesses enough force to snap back violently when improperly handled posing a tangible risk of injury to the wearer.
  • Dependency Issues: reliance on the device leads to a sense of heaviness and sluggishness once removed as the users natural muscles attempt to re-adapt to unassisted movement.
  • Practical Limitations: battery life is quickly depleted by high-intensity settings leaving the user to manually carry five pounds of inactive robotic weight.
  • Market Viability: the product remains a niche toy for wealthy hobbyists rather than a perfected tool, often leading to secondary muscle soreness from overexertion while pretending to have infinite endurance.

Hypershell’s new X Ultra S motorized exoskeleton is aimed at outdoor enthusiasts who want to cover more distance with less effort.
Nicole Nguyen

By

Nicole Nguyen

| Photography and Video by Poppy Lynch

May 20, 2026 9:00 pm ET

BPC > Only use to renew if text is incomplete or updated: | archive.li
BPC > Full article text fetched from (no need to report issue for external site): | archive.today | archive.vn

  • A columnist tested Hypershell’s X Ultra S, a $1,999 AI-powered bionic leg booster for hiking and cycling.
  • The hip-based exoskeleton uses AI software to interpret movements, providing mechanical force that aids uphill climbs and walking in sand.
  • The device, which can feel like being under the control of a puppeteer, requires manual downhill mode activation and poses risks if handled improperly.
This summary was generated with AI and reviewed by an editor. Read more about how we use artificial intelligence in our journalism.
  • A columnist tested Hypershell’s X Ultra S, a $1,999 AI-powered bionic leg booster for hiking and cycling.
    View more
AI is entering the physical realm in a big way. Case in point: I spent Mother’s Day e-hiking with bionic leg boosters.
I strapped my two-year-old in a hiking carrier, and went for a walk up a winding dirt trail. The robotic hip motors whirred and a mechanical force tugged at my quads.
As the incline got steeper, I opened the companion app on my phone and pressed “Boost.”
 My stride quickened. The whirring got louder. As the AI marched me toward the summit, I enjoyed the view. Thirty seconds later, the surge of power was over, and the legs returned to the gentler Eco mode.
These motorized supports were once reserved for military, heavy industry and mobility rehabilitation. Now, they are light and affordable enough for regular folks—regular folks who want to feel superhuman, that is. These hip-based systems start at $900, and I tried the latest, Hypershell’s $1,999 X Ultra S.
Two weeks of testing didn’t transform me into Iron Man. But the rig did make me want to sprint everywhere—and I hate running.

A bionic puppet master

Like its competitors, Hypershell’s exoskeleton consists of a waist band and a pair of hinged thigh braces. Twin hip motors draw up to 1,000 watts to power the carbon-fiber arms that apply force to your leg. Theoretically, at its top supported speed, the X Ultra S could help you run an elite four-minute mile.
Before you read on, watch me using it in this video:
See how the Hypershell X Ultra S works. Photo: Poppy Lynch for WSJ
Long nature walks are more my speed, so I took the bionic legs on San Francisco’s hilliest trails. Passersby stared at the unsubtle contraption. If I looked like a dork, at least I could zoom quickly past the judgment.
The first time I wore them, the contraption felt like a puppeteer controlling my legs. The motors can jerk you around, especially if you start, stop or change direction suddenly. Once you get into a constant, repetitive motion, the push-and-pull sensation fades.
I used the mobile app to calibrate the power—25% on Eco mode was just right to start.
AI software the company calls Hyperintuition interprets your movements to deliver the right force. Going uphill, with the torque notched up, I really felt the exoskeleton at work. Climbing up stairs was like walking up an escalator. The e-assistance really shines on sand, where you normally feel your energy sapped away.
But as I crossed a rocky section with some loose boulders, I worried one wrong jerk could send me tumbling, so I dialed down the power. A Hypershell spokesman said that a snug, proper fit and a lower level of assistance can help on unstable terrain.
I was disappointed with my downhill experience. Descents can be exhausting but the AI isn’t smart enough to detect them. You have to dig into the app, and activate Downhill mode yourself.
A couple of power-walking hours later, I removed the legs. My body felt slow and heavy. I was like an astronaut returning to Earth, getting reacquainted with my own muscles.

The cyborg cyclist

On a bike ride, the puppet effect was even more dramatic. At one point I was barely moving my legs myself. On certain climbs, I topped out panting and exhausted, then realized my bike was in its hardest gear.
The Spandex-clad cyclists who tackle San Francisco’s iconic Hawk Hill have a saying: “The climb doesn’t get easier, you just get faster.”
That’s also true of riding with battery-boosted legs. I was out of breath because I was pedaling at my regular cadence, but each stroke had a lot more power.
Hypershell’s app lets you set the exoskeleton’s assistance intensity.
The exoskeleton’s AI-enabled software can detect movement and deliver the appropriate amount of power—more for running, less for walking.
Hypershell’s companion app, left, lets you set the assistance intensity of the exoskeleton, which can output a max of 1,000 watts of power.
The device is mostly intuitive but comes with some dangers. After my ride, I unstrapped the unit thinking it was off, but an active arm snapped back with full force. I wasn’t hurt, but it was a stark reminder of the risks of robotics in our everyday lives. Hypershell’s app includes reminders for responsible use, including the proper way to disengage.
Minutes after I returned home, the battery died. I was relieved I wasn’t stranded on a mountain, hauling 5 pounds of dead robot weight. I think in my testing I overtaxed the battery by relying on the Boost too much. The company says you can generally walk 18 miles in the X Ultra S, twice that with the included extra battery pack.

Who needs magic legs?

I’m not sure exoskeletons are ready for prime time, though early-adopter gearhead types with a couple thousand dollars to burn will have fun with them. I do look forward to taking them backcountry skiing come winter. Those long uphill climbs (not to mention keeping up with my much fitter husband) can be grueling.

Many of us could explore motorized legs as they become lighter, cheaper and more discreet: especially people who are just getting active, or older folks who want support while hiking. And there’s a case for serious athletes, who could use the Fitness setting to actually add resistance during workouts.
One side effect: My calves were sore for days. The exoskeleton made me feel like I had infinite endurance so I kept going, and my other muscles paid the price. Fortunately, another company makes a bionic system for the lower leg. Maybe I’ll wear them together for my next review—and let the bots do all the work.

Copyright ©2026 Dow Jones & Company, Inc. All Rights Reserved. 87990cbe856818d5eddac44c7b1cdeb8

Already a subscriber? Sign In

Read the whole story
bogorad
1 day ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete

Project Glasswing: what Mythos showed us

1 Share

LLM (google/gemini-3.1-flash-lite-20260507) summary:

  • Technological Claims: the model is marketed as a superior tool for linking disparate system vulnerabilities to develop sophisticated attack chains.
  • Automated Verification: the software attempts to validate its own findings by writing and executing code, though this process is fundamentally probabilistic and prone to inconsistency.
  • Inconsistent Guardrails: the lack of standardized safety protocols results in unpredictable behavior where identical security requests are handled differently based on contextual framing.
  • Signal Noise: reliance on language models increases the volume of speculative, low-quality findings which shift the burden of verification onto human workers.
  • Architectural Limitations: simple agents fail to perform thorough security analysis because their narrow window of awareness and single-threaded operations prevent comprehensive system coverage.
  • Bureaucratic Complexity: addressing systemic flaws requires the construction of an elaborate, multi-stage management harness to forcefully constrain and direct the underlying model.
  • Resource Intensivity: the process prioritizes high-compute, parallelized agent workflows as an expensive remedy for the inherent unreliability of the AI components.
  • Operational Naivety: the promise of accelerated patch cycles ignores the reality of broken software deployments and the inevitable failure points created by rushing updates through automated pipelines.

For the last few months, we've been testing a range of security-focused LLMs on our own infrastructure. These LLMs help identify potential vulnerabilities in our own systems, so we can fix them – and they also show us what attackers are going to be able to do with the latest models.

None of these LLMs has captured more attention than Mythos Preview, from Anthropic. A few weeks ago, we were invited to use Mythos Preview as part of Project Glasswing. We soon pointed it at more than fifty of our own repositories – to see what it would find, and to see how it works.

This post shares what we observed, what the models did well and what they didn't, and how the architecture and process around them needs to change, so they can be used at scale.

What changed with Mythos Preview

Mythos Preview is a real step forward, and it's worth saying that plainly before getting into anything else. We've been running models against our code for a while now, and the jump from what was possible with previous general-purpose frontier models to what Mythos Preview does today is not just a refinement of what came before.

It's a different kind of tool doing a different kind of work, and that makes a clean apples-to-apples comparison to earlier models difficult. So rather than trying to benchmark Mythos Preview against general-purpose frontier models, it's more useful to describe what it can actually do, and two features that stood out across the work we did with Mythos Preview:

  • Exploit chain construction - A real attack rarely uses one bug. It chains several small attack primitives together into a working exploit. For instance, it might turn a use-after-free bug into an arbitrary read and write primitive, hijack the control flow, and use return-oriented programming (ROP) chains to take full control over a system. Mythos Preview can take several of these primitives and reason about how to combine them into a working proof. The reasoning it shows along the way looks like the work of a senior researcher rather than the output of an automated scanner.

  • Proof generation - Finding a bug and proving it's exploitable are two different things, and Mythos Preview can do both. It writes code that would trigger the suspected bug, compiles that code in a scratch environment, and runs it. If the program does what the model expected, that's the proof. If it doesn't, the model reads the failure, adjusts its hypothesis, and tries again. The loop matters as much as the bugs it finds, because a suspected flaw without a working proof is speculation, and Mythos Preview closes that gap on its own.

Some of what we describe above is not entirely unique to Mythos Preview. When we ran other frontier models through the same harness, they found a fair number of the same underlying bugs, and in some cases they got further than we expected on the reasoning side too. Where they fell short was at the point of stitching the pieces together. A model would identify an interesting bug, write a thoughtful description of why it mattered, and then stop, leaving the actual chain unfinished and the question of exploitability open. What changed with Mythos Preview is that a model can now take those low-severity bugs (which would traditionally sit invisible in a backlog) and chain them into a single, more severe exploit. 

Model refusals in legitimate vulnerability research

The Mythos Preview model provided by Anthropic, as part of Project Glasswing, did not have the additional safeguards that are present in generally available models (like Opus 4.7 or GPT-5.5).

Despite this, the model organically pushes back on certain requests - much like the cyber capabilities that made it useful for vulnerability hunting, the model has its own emergent guardrails that sometimes cause it to push back on legitimate security research requests. But as we found, these organic refusals aren’t consistent - the same task, framed differently or presented in a different context, could produce completely different outcomes as illustrated in the examples below.

Example of Mythos Preview pushing back on building a working proof of concept 

For example, the model initially refused to do vulnerability research on a project, then agreed to perform the same research on the same code after an unrelated change to the project’s environment. Nothing about the code being analyzed had changed. In another case, the model found and confirmed several serious memory bugs in a codebase, and then refused to write a demonstration exploit. The same request, framed differently, got a different answer, and even the same request can produce different outcomes across runs due to the probabilistic nature of the model. Semantically equivalent tasks can produce opposite outcomes depending on how and when they’re presented to the model.

This matters because while the model’s organic refusals/guardrails are real, they aren’t consistent enough to serve as a complete safety boundary on their own. That’s precisely why any capable cyber frontier model made generally available in the future must include additional safeguards on top of this baseline behavior - making it appropriate for broader use outside of a controlled research context like Project Glasswing.

The signal-to-noise problem

One of the hardest parts of triaging security vulnerabilities is deciding which bugs are real, which are exploitable, and which need fixing now. This was a hard problem even in the pre-AI world. AI vulnerability scanners and AI-generated code have made it worse, and at Cloudflare we've built multiple post-validation stages to deal with it.

Two factors dominate the noise rate:

  • Programming language - C and C++ give you direct memory control and, with it, bug classes - buffer overflows, out-of-bounds reads and writes - that memory-safe languages like Rust eliminate at compile time. We saw consistently more false positives from projects written in memory-unsafe languages.

  • Model bias - A good human researcher tells you what they found and how confident they are. Models don't. Ask a model to find bugs, and it will find them, whether the code has any or not. Findings come back hedged with "possibly," "potentially," "could in theory," and the hedged findings vastly outnumber the solid ones. That's a reasonable bias for an exploratory tool. It's a ruinous one for a triage queue, where every speculative finding spends human attention and tokens to dismiss, and that cost compounds across thousands of findings.

Mythos Preview represents a clear improvement here, particularly in its ability to chain primitives - combining multiple vulnerabilities into a working proof of concept rather than reporting them in isolation. A finding that arrives with a PoC is a finding you can act on, and it means far less time spent asking "is this even real?"

Our harnesses are deliberately tuned to over-report, so we see more (and miss less), which comes with a lot more noise. But at triage time, Mythos Preview's output has noticeably higher quality: fewer hedged findings, clearer reproduction steps, and less work to reach a fix-or-dismiss decision.

Why pointing a generic coding agent at a repo doesn't work

When we first started AI-assisted vulnerability research last year, our instinct was the obvious one: point a generic coding agent at an arbitrary repository and ask it to discover vulnerabilities. This approach works, in the sense that the model will produce findings, but it doesn't work in producing meaningful coverage of a real codebase and identifying findings of value. There are two main reasons for this:

  • Context - Coding agents are tuned for one focused stream of work: building a feature, fixing a bug, writing a refactor. They ingest a lot of source code, hold a single hypothesis at a time, and iterate against it. That's exactly the wrong shape for vulnerability research, which is narrow and parallel by nature. A human researcher picks one specific thing to look at and investigates it thoroughly. That one thing might be a single complex feature, transitions across security boundaries, or a specific vulnerability class like command injections, where attacker input ends up being run as a shell command. Then they do it again, for a different feature, security boundary, or vulnerability class, several thousand times across the codebase. A single agent session (even with subagents) against a hundred-thousand-line repository can cover maybe a tenth of a percent of the surface in a useful way before the model's context window fills up and compaction kicks in - potentially discarding earlier findings that would have mattered.

  • Throughput - A single-stream agent does one thing at a time, but real codebases need many hypotheses against many components at once, with the ability to fan out further when something interesting turns up. You can drive a single agent harder, but at some point you stop being limited by the model and start being limited by the shape of the interaction itself. Using the model directly in a coding agent turns out to be fine for manual investigation when a researcher already has a lead and wants a second pair of eyes. However, it's the wrong tool for achieving high coverage. Once we accepted that, we stopped trying to make Mythos Preview do the wrong job and started building the harness around it instead.

What a harness actually fixes

Four lessons came out of running the work at scale, and each one pointed to the need for a harness that manages the overall execution:

  • Narrow scope produces better findings - Telling the model "Find vulnerabilities in this repository" makes it wander. Telling it "Look for command injection in this specific function, with this trust boundary above it, here's the architecture document and here's prior coverage of this area" makes it do something much closer to what a researcher would actually do.

  • Adversarial review reduces noise - Adding a second agent between the initial finding and the queue - one with a different prompt, a different model, and no ability to generate its own findings - catches a lot of the noise that the first agent would miss if it just checked its own work. It turns out that putting two agents in deliberate disagreement is way more effective than just telling one agent to be careful.

  • Splitting the chain across agents produces better reasoning - Asking "Is this code buggy?" and "Can an attacker actually reach this bug from outside the system?" are two different questions, and the model is better at each one when you ask them separately, because each question is narrower than the combined version.

  • Parallel narrow tasks beat one exhaustive agent - Coverage improves when many agents work on tightly scoped questions and we deduplicate the results afterward, rather than asking one agent to be exhaustive.

Each of those observations is about model behavior, and put together they describe something that isn't a chat interface anymore. It's a harness that helps you achieve the final outcomes. The first steps to building a harness are simple, as you can ask the model to help, which is what we did. We used Mythos Preview to build on, tailor, and improve our original harnesses to suit its strengths. An example of what a harness looks like in practice is described below.

Our vulnerability discovery harness

Here's what our vulnerability discovery harness looks like, stage by stage. It was used to scan live code across our runtime, edge data path, protocol stack, control plane, and the open-source projects we depend on.

Stage What it does Why it matters

Recon
An agent reads the repository from the top down, fans out to subagents responsible for each subsystem, and produces an architecture document covering build commands, trust boundaries, entry points, and likely attack surface. It also generates the initial queue of tasks for the next stage.   Gives every downstream agent shared context. Cuts the wander problem.
 
Hunt
Each task is one attack class paired with a scope hint. Hunters (the agents that actually look for bugs) run concurrently, typically around fifty at once, each fanning out to a handful of exploration subagents. Each hunter has access to tools that compile and run proof-of-concept code in a per-task scratch directory. This is where most of the work happens. Many narrow tasks in parallel, not one exhaustive agent.

Validate
An independent agent re-reads the code and tries to disprove the original finding. It uses a different prompt and has no ability to emit new findings of its own. Catches a meaningful fraction of the noise the hunter wouldn't catch when reviewing its own work.

Gapfill
Hunters flag areas they touched but didn't cover thoroughly. Those areas get re-queued for another pass. Counteracts the model's tendency to drift toward attack classes it has already had success with.

Dedupe
Findings that share the same root cause collapse into a single record. Variant analysis is a feature, not a way to inflate the queue with duplicates.

Trace
For each confirmed finding in a shared library, a tracer agent fans out (one instance per consumer repository), uses a cross-repo symbol index, and decides whether attacker-controlled input actually reaches the bug from outside the system. Turns "there is a flaw" into "there is a reachable vulnerability." This is the stage that matters most.

Feedback
Reachable traces become new hunt tasks in the consumer repositories where the bug is actually exposed. Closes the loop. The pipeline gets better as it runs.

Report
An agent writes a structured report against a predefined schema, fixes any validation errors against that schema itself, and submits the report to an ingest API. Output is queryable data, not free-form prose.

What this means for security teams

The loudest reaction to Mythos Preview from other security leaders has been about speed - scan faster, patch faster, compress the response cycle. More than one team we have spoken with is now operating under a two-hour SLA from CVE release to patch in production. The instinct is understandable: when the attacker timeline shortens, the defender timeline has to shorten with it. Faster is not going to be enough, and we think a lot of teams are about to spend a lot of time, effort, and money learning that the hard way.

Patching faster does not change the shape of the pipeline that produces the patch. If regression testing takes a day, you cannot get to a two-hour SLA without skipping it, and the bugs you ship when you skip regression testing tend to be worse than the bugs you were trying to patch. We learned a version of this when we tried letting the model write its own patches and watched a few go out that fixed the original bug while quietly breaking something else the code depended on.

The harder question is what the architecture around the vulnerability should look like. The principle is to make exploitation harder for an attacker even when a bug exists, so that the gap between when a vulnerability is disclosed and when it is patched matters less. That means defenses that sit in front of the application and block the bug from being reached. It means designing the application so that a flaw in one part of the code cannot give an attacker access to other parts. It means being able to roll out a fix to every place the code is running at the same moment, rather than waiting on individual teams to deploy it. 

We also recognize this topic cuts both ways. The same capabilities that helped us find bugs in our own code will, in the wrong hands, accelerate the attack side against every application on the Internet. Cloudflare sits in front of millions of those applications, and the architectural principles described above are exactly the ones our products are built to apply on behalf of customers. We will share more on what that means for customers in the weeks ahead.

If your team is doing similar work and would like to compare notes, reach out to us at security-ai-research@cloudflare.com.

Our research with Mythos Preview was conducted in a controlled environment against our own code; every vulnerability surfaced through this work was triaged, validated, and remediated where action was needed under Cloudflare's formal vulnerability management process.

This work was a team effort. Thanks to Albert Pedersen, Craig Strubhart, Dan Jones, Irtefa Fairuz, Martin Schwarzl, and Rohit Chenna Reddy for their contributions to the research, engineering, and analysis behind this blog post.

Read the whole story
bogorad
4 days ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete

Here Comes (Forward Deployed) Everybody - by Scott Werner

1 Share

LLM (google/gemini-3.1-flash-lite-20260507) summary:

  • Business Model Obsolescence: major software vendors are offloading their product configuration tasks directly onto paying customers under the guise of headless infrastructure.
  • Externalized Labor Costs: corporate entities are effectively forcing employees to absorb the technical burden of assembly while maintaining the same subscription fees.
  • Administrative Role Inflation: typical operational roles are being forcibly rebranded into technical integration positions to compensate for missing vendor features.
  • Customer Self Service Myth: historical shifts like grocery self checkout serve as the blueprint for replacing paid professional service roles with unpaid consumer labor.
  • Atomization Of Coordination: current software trends encourage isolated internal workflows that prioritize redundant individual builds over shared institutional knowledge.
  • Pit Crew Implementation: business functions are being coerced into hiring internal technical specialists solely to maintain and glue together fragmented third party tools.
  • Generic Model Limitations: machine learning capabilities require expensive company specific calibration because off the shelf software is inherently incapable of handling unique operational data.
  • Delusional Scaling Arguments: corporate expansion via increased hiring ratios is framed as a benefit of automation rather than a symptom of ballooning overhead and operational complexity.


Note: I just enabled paid subscriptions for $8/month. Most of these essays will still be free, but I’m working on adding premium features to Artifact Land and launching a hosted version of Conjure among other upcoming products that will come with the subscription. I’m planning on increasing the price to $20/month once these features are launched. So upgrade your subscription now to lock in the intro price.


Ok… picture this… you’re standing at a self-checkout at a grocery store.

The screen is yelling at you about an unexpected item in the bagging area. You look down, is the unexpected item the banana? Is it your reusable totes? The machine doesn’t seem to want to give you any hints either way.

Behind you, a child is negotiating, in the loudest possible terms, for one of the pouches in their parent’s cart. A barcode is failing to scan for the eleventh time. And there’s one employee overseeing six of these machines like a shepherd whose sheep have all started doing their own taxes.

How did we get here?

Automated Salesforce Machine

In April 2026, Salesforce announced Headless 360.

The pitch, from Marc Benioff: No browser required. The API is the UI.

You can basically translate this to:

we’re no longer going to ship you software. we’re going to ship you the raw materials of software. you can figure out the rest.

If you heard this and shrugged, I don’t blame you. It’s an API. APIs are old. What’s the big deal?

The big deal is that Salesforce is the largest enterprise software vendor on earth, and they just told their entire customer base that the part of the product they use most is no longer Salesforce’s job.

It is the customer’s job.

I don’t think Salesforce is going to be the only time we see this. I see this as a direction of travel announcement. Every major enterprise vendor is going to do some version of this in the next eighteen months. They’re going to call it different things or dress it up in different words. But the shape will be the same: the vendor ships the substrate, and somebody at your company assembles the substrate into something that does work.

That somebody is probably going to be you.

Unbundling Implementation

Now I know that “the vendor used to do this for you” isn’t the whole story.

Implementation labor was always layered across an ecosystem with a thin slice of vendor-paid solutions engineers at the top, a much bigger slice of customer-paid integrators and agencies in the middle, and underneath all of it, a job category. The Salesforce admin. The Design Ops or Marketing Ops Manager. People whose entire role inside your company was to configure another company’s products for you.

The customer was always expected to pay for most of the cake.

Headless 360 just significantly changes the scope of what the people the customer was already paying are now expected to do.

The Salesforce admin role gets re-scoped. What used to be “click through the configuration screens that Salesforce designed for you.” Now has no screens and the admin is wiring together workflows that didn’t exist as a product feature an hour ago using agents, MCP, custom integrations, things that don’t have a Trailhead course yet. And that’s just the people who already had the role.

But to me what this hints at is that every other function in your company is about to need its own version of that role. Marketing needs one. Finance needs one. Legal, ops, support, recruiting, even engineering. Each function uses different software and lives in a different corner of the business, but each one now needs somebody whose job is to translate generic AI capability into something that does work here, specifically.

It was easy to name this person’s role when they only existed inside one product. But what do you name the version of them about to exist in every department of every company at once?

I don’t know, but I think it means there’s about to be a lot of those people.

Enter Colonel Saunders

In 1917, in Memphis, a man named Clarence Saunders opened a store called Piggly Wiggly. (We used to live in much more whimsical times…)

Clarence had the wild idea to let you, the customer, walk around and pick your own groceries off the shelves.

Before Piggly Wiggly, you had to give a list to a clerk who fetched the things for you. That was the clerk’s entire job. They had everything memorized. They knew where the flour was, intimately, like family.

Saunders looked at that beautiful, dignified, fairly-paid clerk and said: “what if the customer just did that part, for free?”

And we said: “ok!”

We’ve been saying ok for over a hundred years.

  • 1917 — customer picks items off the shelf (clerk loses one job)

  • 1970s — barcodes price and inventory the items (clerk loses another)

  • 2000s — customer scans the items themselves (clerk mostly stops existing)

Each wave needed a capability unlock. Open-shelf store layouts. UPC codes. Cheap touchscreens that could yell about bagging areas without needing a human supervisor for every single machine.

And each wave was sold as convenience.

Yes, I understand that the clerk job mostly disappeared, but the point is that the labor didn't. The consumer now has to do it. For enterprise software it's a bit different, you're not choosing to enter the Salesforce store, that decision is made for you.

Here comes everybody (again)

Fast forward to 2008. A guy named Clay Shirky wrote a book called Here Comes Everybody.

The book’s argument was essentially that institutions exist because coordinating people is expensive. You need bosses, processes, headquarters, payroll, and a building with the company name on it because otherwise nothing gets done. The firm exists to absorb coordination costs.

Shirky’s bet was that the internet collapsed those costs to near zero which caused institutional functions to start leaking out into ad-hoc groups. Wikipedia over Britannica. Flash mobs happened. Coordination got cheap enough to organize without organizations.

Eighteen years later, almost exactly, I think we are watching the same trick get pulled with a different cost curve.

That was the coordination story. We are now living through the building story.

Building complex software used to require a software company. You needed engineers, and a build process, and a UI design phase, and someone whose entire job was figuring out what to do with the JIRA tickets. Building was institutionally expensive in the same way coordinating used to be.

Agentic coding tools, MCP, headless platforms, and so on are already starting to do to building what the internet did to coordinating. Building is cheap now and people everywhere are waking up to it. A finance lead can spin up a reconciliation agent on a Tuesday afternoon. A recruiter can wire up a candidate-research workflow over coffee and a chocolate croissant.

Coordination got cheap enough to organize without organizations.
Implementation got cheap enough to implement without implementers.

Shirky’s everybody came together. Ours comes apart.

His version produced Wikipedia where a million people work together to build one thing. The 2026 version produces a million people each building their own separate thing in their own separate corner of their own separate company. A million reconciliation agents. A million candidate-research workflows. None of them shareable. None of them composable. The disintermediation is the same; the sociology is the opposite.

The old everybody convened. The new everybody atomizes. Coordination was a tax we paid because software was scarce, and we don’t have to pay it anymore. This is what software finally being abundant looks like.

Pit Crew

So what do we call this person? The one in marketing or finance or legal who’s now expected to translate generic AI capability into something that does work in their corner of the business?

I’ve been using Pit Crew over in Near Zero, but I’m sure we’ll call it something else. Though I’m not convinced we’ll use Stripe’s Forward Deployed AI Accelerator, Marketing either.

Your marketer has the taste. They know your brand voice, what’s been tried, when a subject line is going to land and when it wont. The marketer is the driver. The car they’re now driving is AI. It is powerful, fast, finicky, capable of going off the track in genuinely surprising ways if it isn’t tuned correctly. The Pit Crew tunes the car.

You can’t expect every marketer to know how to configure an MCP server or stitch six APIs together with an agent. Similarly the Pit Crew doesn’t need to write a brand voice guide. Neither of them wins the race alone. The marketer brings what to build and why. The Pit Crew brings how to build it and how to keep it running at speed.

Every domain expert in your company is about to need their Pit Crew counterpart. Or be one. Or both.

There are two reasons every function is going to need this person, and they push in the same direction.

The first is what we’ve been talking about all post. Headless platforms externalize implementation labor onto your team. You’re forward-deployed for the vendor just billed to your own employer.

The second is bigger and more permanent than any one vendor. Models are generic. The model doesn’t know your customers, your data, your weird Q3 reporting requirement, the fact that one specific salesperson refuses to use the new CRM no matter how many times you ask. AI capability only becomes useful at the point of contact with a specific workflow, dataset, or person. It doesn’t make sense to have a central “AI team” any more than it would be to have a central “Excel team.” Every function gets its own.

I personally like “Pit Crew.” But I’m sure the industry is going to come up with something else (maybe better? I don’t know…). But the role is real before its vocabulary is, and I’d rather pick an imperfect name than wait around for a good one.

Empowerment and Extraction

I do truly believe this is empowerment. Pit Crew is a real career path with real leverage, and the people who get good at it early are going to eat extremely well. People can genuinely do things they couldn’t do a year ago, and I’m blown away by the things I’ve been seeing.

Which is different from what you see in most headlines these days. The current consensus is that all of this means the need for fewer jobs. One Pit Crew member, they say, can do the work that used to take twenty marketers. So you keep the Pit Crew, you lay off the twenty, and you write yourself a thank-you note in the form of an EBITDA improvement.

I think this is wrong about the direction of the change.

The marketing team doesn’t shrink. It grows. So does the Pit Crew supporting it. Both numbers go up. Once a marketer paired with a Pit Crew is dramatically more productive, that pair is dramatically more valuable to the business. Valuable functions don’t shrink. They get more budget. They hire. The output expands, and demand expands with it, because there turn out to be enormous amounts of marketing work that nobody could previously imagine doing because nobody could previously afford to do it.

So you don’t go from 20 marketers to 1 marketer plus 1 Pit Crew. You go from 20 marketers to 25 marketers plus 5 Pit Crew. Then 30 plus 10. The Pit Crew ratio rises. The marketing team rises with it. The whole org chart gets taller. The labor multiplies. Every other time in history that software engineering became cheaper, demand skyrocketed. Why would this time be any different?

Call one bet “Substitution” if a company sees Pit Crew as a way to do the same work cheaper. The other “Multiplication” if a company sees Pit Crew as a way to do much more work, period.

Both are happening in different companies right now. Only one of them is right about the future. The companies betting on substitution are going to wonder, in about eighteen months, where their competitive advantage went. The answer will be that it went to the company that hired more people and more pit crew for them, not fewer.

Share

Read the whole story
bogorad
4 days ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete

Frictionless Security and Supersonic Flights: What Travel Might Look Like in 20 Years - WSJ

1 Share

LLM (google/gemini-3.1-flash-lite-20260507) summary:

  • Artificial Intelligence Dominance: personal ai agents assume complete control over mundane travel logistics to eliminate consumer decision making.
  • Fragmented Infrastructure: traditional airports are subdivided into neighborhood terminals to force integration into high-density urban real estate.
  • Biometric Surveillance: privacy dissolves as security systems track gait and heart rate while sniffing passengers for total frictionless processing.
  • Bureaucratic Overtourism: governments implement permit schemes and quotas to ration access to popular destinations for elite middle-class travelers.
  • Connected Infrastructure: vehicles and roads constantly share data to automate traffic control and remove human autonomy from transit.
  • Supersonic Commercialization: extreme speed flight returns as a luxury commodity to facilitate rapid movement between global hub cities for the wealthy.
  • Extraterrestrial Hospitality: space stations are marketed as exclusive hotels for those with sufficient capital to escape terrestrial limitations.
  • Technocratic Control: professional futurists project total systemic management to maximize efficiency for high-end consumption at the expense of individual agency.


Illustration of a complex technical drawing depicting an eye with the earth as the iris, surrounded by interconnected images related to travel, technology, and data visualization. Christian Gralingen for WSJ

If you’re a child of the 1970s, you probably looked to the whimsical optimism of the cartoon show “The Jetsons” for an idea of what the future of travel might look like. We would hop into flying “aero sedans” that folded into briefcases and take holidays on an asteroid.

Obviously, those things haven’t materialized, but some big changes in travel are expected by 2046.

We spoke with industry thinkers and researchers about what travel might look like 20 years in the future. Here is what they told us.


Your AI agent handles everything

The era of search-and-click travel, where consumers spend hours on booking sites comparing flight prices and room options, will be over. Scott Fleming, president of the travel practice at Aon, a global professional-services firm, describes a future where your personal AI agent handles the entire choreography of a trip, from that first search to the final taxi home.

“My agent will know the places I like, it will have insight into my finances, my budget, my risk tolerances, all my preferences from the kind of room I like to my pillow type,” Fleming says. That personal agent will interface with travel suppliers’ AI agents—not a human to be heard or seen—and book trips for people from their front door and back, according to their known likes and dislikes.

If a health risk emerges on a route or a flight is disrupted, AI agents will negotiate a solution in real time. The system will monitor conditions continuously, rerouting, rebooking and adjusting everything so that the traveler never has to make a call or chase a refund. “It will take a lot of that stress out of the process,” Fleming says.


The distributed airport

The modern airport, let’s face it, is a time suck. Ty Osbaugh, global practice leader for aviation at the architectural firm Gensler, believes that’s going to change.

He envisions a solution whereby the airport is deconstructed and scattered across its nearest city. In 20 years, New Yorkers won’t have to go to JFK airport two hours ahead of their flight. Instead, they will walk or take a driverless taxi to a neighborhood terminal, drop their bags and clear security biometrically, simply by walking in. No passport queues, no conveyor belts. Then they‘ll board a small, quiet electric air taxi that transports them and five other passengers from a building rooftop to the airport.

Since passengers have already been processed and completed security screening in town, airports will consist of lean, airside-only gate areas: runways, tarmac and jet bridges that you simply walk onto. If you have an important meeting you can’t reschedule, no worries; Your AI assistant will have reserved one of a handful of phone-booth-size private lounges adjacent to your gate, located where currently there are rows and rows of seats. Your oat cappuccino will be waiting on the conference table. Your wearable device will alert you when it’s time to walk onto the plane.

The system as he sees it will work like a subway network: Rather than all passengers converging on the same congested highway corridor to JFK, they can choose the neighborhood entry point nearest to them.

“The idea is to break the airport into different functions—security processing and boarding—and putting each where people want them, Osbaugh says. “Now all your time wasted at the terminal is completely cleared.”

The key to the successful execution of this distributed airport is penetration into the city itself. It will require terminals to be integrated into the vertical fabric of urban buildings, Osbaugh says. “Imagine if the terminal was part of a skyscraper that had apartments on the lower floors and the convenience that would provide,” he says. The more access points embedded throughout a city—and let’s not forget its suburbs—the more the single biggest source of travel stress disappears: the unpredictable slog from home to gate.


Frictionless security

Getting through security, meanwhile, is in for major changes. Aon’s Fleming sees biometrics replacing document checks across the entire travel experience—not just at airports but woven continuously throughout the journey, including at international borders. Security systems will read your face, as well as your gait, heart rate and physiology while allowing you to keep moving. “These systems will even smell you,” he predicts. “We use dogs now, but I think the level of security will be automated and be a benefit to all.” The queues, the bins, the removing of shoes will have totally disappeared, and you’ll be able to board a plane or ship without any friction.

“For comparison purposes, consider the old toll booth approach at the tollway or turnpike 30 years ago versus the Zip Cash or Toll Tag systems we see today,” Fleming says.


Demand-controlled destinations

Countries like India and China that together account for around a third of the world’s population are moving enormous numbers of people into the middle class. That could lead to even bigger crowds in Rome, Paris and many of the other places that have defined tourism for generations.

Richie Karaburun, a clinical associate professor at New York University’s Jonathan M. Tisch Center of Hospitality, believes “overtourism demand control” will reshape how the world’s most iconic destinations operate. To keep sites from being “loved to death,” cities may set visitor caps, requiring permits during peak seasons and compelling visitors to get timestamped reservations to enter popular sights like many museums do now. “What’s coming next is a shift from managing individual sites to managing entire destinations as controlled systems,” Karaburun says. “So instead of just needing a ticket for the Colosseum, visitors may increasingly need to plan and secure access to Rome itself in advance during high-demand windows.”

The pressure will ultimately redirect travelers toward places that are extraordinary but currently overlooked. “There will be new stars, new destinations added to the tourist’s list,” Karaburun says. “You’re already seeing this shift with Porto and Valencia relative to Lisbon and Barcelona, or Ljubljana and Palermo relative to Venice and Florence,” he says. In Asia, secondary cities like Kanazawa in Japan are gaining traction beyond Tokyo and Kyoto.


Smarter roads

The future of road travel is less about flying cars than about eliminating the tensions and anxieties that make driving so exhausting. Roads, signs, traffic lights and vehicles will increasingly talk to each other, sharing information in real time.

“When a car suddenly slams on the brakes in front of you, it will send out a message to roadway devices and to the cars behind it,” says Philip Plotch, a principal researcher and senior fellow at the Eno Center for Transportation. “You’ll know instantly what happened, giving you more time to react. Or the car might even slow down or stop on its own.”

Even before fully driverless cars arrive, this growing communication between vehicles and infrastructure will make driving safer and less stressful, reducing surprises and smoothing traffic flow. As more advanced automation takes hold, the experience of being in a car will start to feel fundamentally different.

Once you don’t have to keep your eyes on the road, a long drive begins to resemble a train trip, giving passengers time to read, watch something or rest instead of constantly focusing. That shift will change how and how far people are willing to travel, Plotch says.


Faster flight

The physics of travel itself will change by 2046. Supersonic flight—flying from New York to London in under 90 minutes at Mach 3 (three times the speed of sound) for dinner—could become routine for the affluent. Aon’s Fleming points to Boom Supersonic’s planned Overture jet, which is currently running successful supersonic tests at Mach 1.7 and could be in service as soon as the end of this decade. “It’s hard to see us not having supersonic travel in the 2030s at this point,” Fleming says, “but it remains to be seen if it’s at scale or limited to the upper end of the market.” Boom says it expects future versions of its aircraft to become faster and more affordable over time.

Commercial supersonic travel isn’t new, of course. The Concorde cut trans-Atlantic flight times in half beginning in 1976, but the flights were expensive to operate, carried relatively few passengers, consumed large amounts of fuel and faced strict noise limits after sonic booms triggered public backlash, confining most flights to ocean routes. A fatal crash in 2000 and falling demand after 9/11 helped lead to the planes’ discontinuation in 2003.

According to Fleming, the new generation of supersonic startups will be able to leverage advanced technology such as lighter and active-cooling composite materials, more-efficient engines and aerodynamic shapes, sustainable aviation fuel and quieter boom technology, allowing high-speed air travel to finally be commercially viable, especially for premium travelers willing to pay for time savings on long-haul routes.

“Supersonic travel will compress the world in a way we haven’t seen since the Jet Age,” says NYU’s Karaburun. If long-haul flights shrink to a few hours, cities like New York, London and Dubai begin to function less like distant hubs and more like a connected corridor.

Beyond supersonic travel lies hypersonic travel, which involves flying at Mach 5 or above and comes with intense thermal challenges that have yet to be resolved. Fleming notes that some aerospace companies are working to develop such aircraft, though he predicts passenger service won’t be available until “2035–2040 at the earliest.”


Space hotels

This may be a bit farther out, but it’s possible that the true “space hotels”—commercial space stations with hospitality amenities—could emerge as early as the 2030s, says Karaburun.

Like hypersonic travel, Fleming says, these trips at first will be accessible only to the ultrawealthy, but by the 2040s, as launch costs fall, that market could expand modestly. “I expect the first space hotels to be in orbit, much like the [International Space Station] today, with a few nice hotel rooms with a remarkable view, probably combined with a research facility,” he says.

Karaburun sees a similar future. “These will be small, expensive and tightly controlled, more akin to early Antarctic expeditions than traditional tourism,” he says.

Write to reports@wsj.com

Copyright ©2026 Dow Jones & Company, Inc. All Rights Reserved. 87990cbe856818d5eddac44c7b1cdeb8

Heidi Mitchell is a contributor to The Wall Street Journal.

Up Next


Videos

Read the whole story
bogorad
7 days ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete
Next Page of Stories