Strategic Initiatives
11605 stories
·
45 followers

Dfinity launches Caffeine, an AI platform that builds production apps from natural language prompts | VentureBeat

1 Share
  • Dfinity Foundation releases Caffeine AI platform: On Wednesday, the Dfinity Foundation launched Caffeine publicly today, an AI system allowing users to create and deploy web applications via natural language on the decentralized Internet Computer Protocol (ICP), aiming to replace human technical teams and ensure data safety in AI updates.
  • Caffeine differs from existing AI tools: Unlike GitHub Copilot or Cursor, which assist human coders, Caffeine enables full AI autonomy in writing, deploying, and updating production apps without human codebase intervention.
  • Early user adoption and engagement: Over 15,000 alpha testers participated, with 26% daily active users, leading to high engagement where some users built apps all day, prompting potential usage limits due to AI costs.
  • Motoko language ensures data integrity: Applications use Motoko, a Dfinity-developed language providing mathematical guarantees against data loss during AI-driven updates, preventing failed upgrades from deleting information.
  • Orthogonal persistence simplifies development: This feature merges application logic and data storage, eliminating database management code and enabling AI to focus on functionality, with verified data migrations for structural changes.
  • Enterprise transformation potential: Targets corporations and governments for building portals, CRMs, and ERPs conversationally, reducing costs to 1% of traditional methods and avoiding SaaS lock-in on decentralized ICP.
  • Distinctions from vibe coding platforms: Addresses limitations in tools like Replit and Lovable, such as app breakage, security issues, and data loss, through ICP's tamper-proof execution and Byzantine fault tolerance.
  • Vision of self-writing internet: Envisions a web where AI handles all development on ICP, empowering non-technical users to build apps, drawing from Dfinity's 2013 roots and demonstrated in hackathons with diverse creations like legal tools and monitoring systems.

The Dfinity Foundation on Wednesday released Caffeine, an artificial intelligence platform that allows users to build and deploy web applications through natural language conversation alone, bypassing traditional coding entirely. The system, which became publicly available today, represents a fundamental departure from existing AI coding assistants by building applications on a specialized decentralized infrastructure designed specifically for autonomous AI development.

Unlike GitHub Copilot, Cursor, or other "vibe coding" tools that help human developers write code faster, Caffeine positions itself as a complete replacement for technical teams. Users describe what they want in plain language, and an ensemble of AI models writes, deploys, and continually updates production-grade applications — with no human intervention in the codebase itself.

"In the future, you as a prospective app owner or service owner… will talk to AI. AI will give you what you want on a URL," said Dominic Williams, founder and chief scientist at the Dfinity Foundation, in an exclusive interview with VentureBeat. "You will use that, completely interact productively, and you'll just keep talking to AI to evolve what that does. The AI, or an ensemble of AIs, will be your tech team."

The platform has attracted significant early interest: more than 15,000 alpha users tested Caffeine before its public release, with daily active users representing 26% of those who received access codes — "early Facebook kind of levels," according to Williams. The foundation reports some users spending entire days building applications on the platform, forcing Dfinity to consider usage limits due to underlying AI infrastructure costs.

Why Caffeine's custom programming language guarantees your data won't disappear

Caffeine's most significant technical claim addresses a problem that has plagued AI-generated code: data loss during application updates. The platform builds applications using Motoko, a programming language developed by Dfinity specifically for AI use, which provides mathematical guarantees that upgrades cannot accidentally delete user data.

"When AI is updating apps and services in production, a mistake cannot lose data. That's a guarantee," Williams said. "It's not like there are some safeguards to try and stop it losing data. This language framework gives it rails that guarantee if an upgrade, an update to its app's underlying logic, would cause data loss, the upgrade fails and the AI just tries again."

This addresses what Williams characterizes as critical failures in competing platforms. User forums for tools like Lovable and Replit, he notes, frequently report three major problems: applications that become irreparably broken as complexity increases, security vulnerabilities that allow unauthorized access, and mysterious data loss during updates.

Traditional tech stacks evolved to meet human developer needs — familiarity with SQL databases, preference for known programming languages, existing skill investments. "That's how the traditional tech stacks evolved. It's really evolved to meet human needs," Williams explained. "But in the future, it's going to be different. You're not going to care how the AI did it. Instead, for you, AI is the tech stack."

Caffeine's architecture reflects this philosophy. Applications run entirely on the Internet Computer Protocol (ICP), a blockchain-based network that Dfinity launched in May 2021 after raising over $100 million from investors including Andreessen Horowitz and Polychain Capital. The ICP uses what Dfinity calls "chain-key cryptography" to create what Williams describes as "tamper-proof" code — applications that are mathematically guaranteed to execute their written logic without interference from traditional cyberattacks.

"The code can't be affected by ransomware, so you don't have to worry about malware in the same way you do," Williams said. "Configuration errors don't result in traditional cyber attacks. That passive traditional cyber attacks isn't something you need to worry about."

How 'orthogonal persistence' lets AI build apps without managing databases

At the heart of Caffeine's technical approach is a concept called "orthogonal persistence," which fundamentally reimagines how applications store and manage data. In traditional development, programmers must write extensive code to move data between application logic and separate database systems — marshaling data in and out of SQL servers, managing connections, handling synchronization.

Motoko eliminates this entirely. Williams demonstrated with a simple example: defining a blog post data type and declaring a variable to store an array of posts requires just two lines of code. "This declaration is all that's necessary to have the blog maintain its list of posts," he explained during a presentation on the technology. "Compare that to traditional IT where in order to persist the blog posts, you'd have to marshal them in and out of a database server. This is quite literally orders of magnitude more simple."

This abstraction allows AI to work at a higher conceptual level, focusing on application logic rather than infrastructure plumbing. "Logic and data are kind of the same," Williams said. "This is one of the things that enables AI to build far more complicated functionality than it could otherwise do."

The system also employs what Dfinity calls "loss-safe data migration." When AI needs to modify an application's data structure — adding a "likes" field to blog posts, for example — it must write migration logic in two passes. The framework automatically verifies that the transformation won't result in data loss, refusing to compile or deploy code that could delete information unless explicitly instructed.

From million-dollar SaaS contracts to conversational app building in minutes

Williams positions Caffeine as particularly transformative for enterprise IT, where he claims costs could fall to "1% of what they were before" while time-to-market shrinks to similar fractions. The platform targets a spectrum from individual creators to large corporations, all of whom currently face either expensive development teams or constraining low-code templates.

"A corporation or government department might want to create a corporate portal or CRM, ERP functionality," Williams said, referring to customer relationship management and enterprise resource planning systems. "They will otherwise have to obtain this by signing up for some incredibly expensive SaaS service where they become locked in, their data gets stuck, and they still have to spend a lot of money on consultants customizing the functionality."

Applications built through Caffeine are owned entirely by their creators and cannot be shut down by centralized parties — a consequence of running on the decentralized Internet Computer network rather than traditional cloud providers like Amazon Web Services. "When someone says built on the internet computer, it actually means built on the internet computer," Williams emphasized, contrasting this with blockchain projects that merely host tokens while running actual applications on centralized infrastructure.

The platform demonstrated this versatility during a July 2025 hackathon in San Francisco, where participants created applications ranging from a "Will Maker" tool for generating legal documents, to "Blue Lens," a voice-AI water quality monitoring system, to "Road Patrol," a gamified community reporting app for infrastructure problems. Critically, many of these came from non-technical participants with no coding background.

"I'm from a non-technical background, I'm actually a quality assurance professional," said the creator of Blue Lens in a video testimonial. "Through Caffeine I can build something really intuitive and next-gen to the public." The application integrated multiple external services — Eleven Labs for voice AI, real-time government water data through retrieval-augmented generation, and Midjourney-generated visual assets — all coordinated through conversational prompts.

What separates Caffeine from GitHub Copilot, Cursor, and the 'vibe coding' wave

Caffeine enters a crowded market of AI-assisted development tools, but Williams argues the competition isn't truly comparable. GitHub Copilot, Cursor, and similar tools serve human developers working with traditional technology stacks. Platforms like Replit and Lovable occupy a middle ground, offering "vibe coding" that mixes AI generation with human editing.

"If you're a Node.js developer, you know you're working with the traditional stack, and you might want to do your coding with Copilot or using Claude or using Cursor," Williams said. "That's a very different thing to what Caffeine is offering. There'll always be cases where you probably wouldn't want to hand over the logic of the control system for a new nuclear missile silo to AI. But there's going to be these holdout areas, right? And there's all the legacy stuff that has to be maintained."

The key distinction, according to Williams, lies in production readiness. Existing AI coding tools excel at rapid prototyping but stumble when applications grow complex or require guaranteed reliability. Reddit forums for these platforms document users hitting insurmountable walls where applications break irreparably, or where AI-generated code introduces security vulnerabilities.

"As the demands and the requirements become more complicated, eventually you can hit a limit, and when you hit that limit, not only can you not go any further, but sometimes your app will get broken and there's no way of going back to where you were before," Williams said. "That can't happen with productive apps, and it also can't be the case that you're getting hacked and losing data, because once you go hands-free, if you like, and there's no tech team, there's no technical people involved, who's going to run the backups and restore your app?"

The Internet Computer's architecture addresses this through Byzantine fault tolerance — even if attackers gain physical control over some network hardware, they cannot corrupt applications or their data. "This is the beginning of a compute revolution and it's also the perfect platform for AI to build on," Williams said.

Inside the vision: A web that programs itself through natural language

Dfinity frames Caffeine within a broader vision it calls the "self-writing internet," where the web literally programs itself through natural language interaction. This represents what Williams describes as a "seismic shift coming to tech" — from human developers selecting technology stacks based on their existing skills, to AI selecting optimal implementations invisible to users.

"You don't care about whether some human being has learned all of the different platforms and Amazon Web Services or something like that. You don't care about that. You just care: Is it secure? Do you get security guarantees? Is it resilient? What's the level of resilience?" Williams said. "Those are the new parameters."

The platform demonstrated this during live demonstrations, including at the World Computer Summit 2025 in Zurich. Williams created a talent recruitment application from scratch in under two minutes, then modified it in real-time while the application ran with users already interacting with it. "You will continue talking to the AI and just keep on refreshing the URL to see the changes," he explained.

This capability extends to complex scenarios. During demonstrations, Williams showed building a tennis lesson booking system, an e-commerce platform, and an event registration system — all simultaneously, working on multiple applications in parallel. "We predict that as people get very proficient with Caffeine, they could be working on even 10 apps in parallel," he said.

The system writes substantial code: a simple personal blog generated 700 lines of code in a couple of minutes. More complex applications can involve thousands of lines across frontend and backend components, all abstracted away from the user who only describes desired functionality.

The economics of cloning: How Caffeine's app market challenges traditional stores

Caffeine's economic model differs fundamentally from traditional software-as-a-service platforms. Applications run on the Internet Computer Protocol, which uses a "reverse gas model" where developers pay for computation rather than users paying transaction fees. The platform includes an integrated App Market where creators can publish applications for others to clone and adapt — creating what Dfinity envisions as a new economic ecosystem.

"App stores today obviously operate on gatekeeping," said Pierre Samaties, chief business officer at Dfinity, during the World Computer Summit. "That's going to erode." Rather than purchasing applications, users can clone them and modify them for their own purposes — fundamentally different from Apple's App Store or Google Play models.

Williams acknowledges that Caffeine itself currently runs on centralized infrastructure, despite building applications on the decentralized Internet Computer. "Caffeine itself actually is centralized. It uses aspects of the Internet Computer. We want Caffeine itself to run on the Internet Computer in the future, but it's not there now," he said. The platform leverages commercially available foundation models from companies like Anthropic, whose Claude Sonnet model powers much of Caffeine's backend logic.

This pragmatic approach reflects Dfinity's strategy of using best-in-class AI models while focusing its own development on the specialized infrastructure and programming language designed for AI use. "These content models have been developed by companies with enormous budgets, absolutely enormous budgets," Williams said. "I don't think in the near future we'll run AI on the Internet Computer for that reason, unless there's a special case."

A decade in the making: From Ethereum roots to the self-writing internet

The Dfinity Foundation has pursued this vision since Williams began researching decentralized networks in late 2013. After involvement with Ethereum before its 2015 launch, Williams became fascinated with the concept of a "world computer"—a public blockchain network that could host not just tokens but entire applications and services.

"By 2015 I was talking about network-focused drivers, Dfinity back then, and that could really operate as an alternative tech stack, and eventually host even things like social networks and massive enterprise systems," Williams said. The foundation launched the Internet Computer Protocol in May 2021, initially focusing on Web3 developers. Despite not being among the highest-valued blockchain projects, ICP consistently ranks in the top 10 for developer numbers.

The pivot to AI-driven development came from recognizing that "in the future, the tech stack will be AI," according to Williams. This realization led to Caffeine's development, announced on Dfinity's public roadmap in March 2025 and demonstrated at the World Computer Summit in June 2025.

One successful example of the Dfinity vision running in production is OpenChat, a messaging application that runs entirely on the Internet Computer and is governed by a decentralized autonomous organization (DAO) with tens of thousands of participants voting on source code updates through algorithmic governance. "The community is actually controlling the source code updates," Williams explained. "Developers propose updates, community reads the updates, and if the community is happy, OpenChat updates itself."

The skeptics weigh in: Crypto baggage and real-world testing ahead

The platform faces several challenges. Dfinity's crypto industry roots may create perception problems in enterprise markets, Williams acknowledges. "The Web3 industry's reputation is a bit tarnished and probably rightfully so," he said during the World Computer Summit. "Now people can, for themselves, experience what a decentralized network is. We're going to see self-writing take over the enterprise space because the speed and efficiency are just incredible."

The foundation's history includes controversy: ICP's token launched in 2021 at over $100 per token with an all-time high around $700, then crashed below $3 in 2023 before recovering. The project has faced legal challenges, including class action lawsuits alleging misleading investors, and Dfinity filed defamation claims against industry critics.

Technical limitations also remain. Caffeine cannot yet compile React front-ends on the Internet Computer itself, requiring some off-chain processing. Complex integrations with traditional systems — payment processing through Stripe, for example — still require centralized components. "Your app is running end-to-end on the Internet Computer, then when it needs to actually accept payment, it's going to hand over to your Stripe account," Williams explained.

The platform's claims about data loss prevention and security guarantees, while technically grounded in the Motoko language design and Internet Computer architecture, remain to be tested at scale with diverse real-world applications. The 26% daily active user rate from alpha testing is impressive but comes from a self-selected group of early adopters.

When five billion smartphone users become developers

Williams rejects concerns that AI-driven development will eliminate software engineering jobs, arguing instead for market expansion. "The self-writing internet empowers eight billion non-technical people," he said. "Some of these people will enter roles in tech, becoming prompt engineers, tech entrepreneurs, or helping run online communities. Humanity will create millions of new custom apps and services, and a subset of those will require professional human assistance."

During his World Computer Summit demonstration, Williams was explicit about the scale of transformation Dfinity envisions. "Today there are about 35,000 Web3 engineers in the world. Worldwide there are about 15 million full-stack engineers," he said. "But tomorrow with the self-writing internet, everyone will be a builder. Today there are already about five billion people with internet-connected smartphones and they'll all be able to use Caffeine."

The hackathon results suggest this isn't pure hyperbole. A dentist built "Dental Tracks" to help patients manage their dental records. A transportation industry professional created "Road Patrol" for gamified infrastructure reporting. A frustrated knitting student built "Skill Sprout," a garden-themed app for learning new hobbies, complete with material checklists and step-by-step skill breakdowns—all without writing a single line of code.

"I was learning to knit. I got irritated because I had the wrong materials," the creator explained in a video interview. "I don't know how to do the stitches, so I have to individually search, and it's really intimidating when you're trying to learn something you don't—you don't even know what you don't know."

Whether Caffeine succeeds depends on factors still unknown: how production applications perform under real-world stress, whether the Internet Computer scales to millions of applications, whether enterprises can overcome their skepticism of blockchain-adjacent technology. But if Williams is right about the fundamental shift — that AI will be the tech stack, not just a tool for human developers — then someone will build what Caffeine promises.

The question isn't whether the future looks like this. It's who gets there first, and whether they can do it without losing everyone's data along the way.

Read the whole story
bogorad
6 hours ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete

Behind Closed Doors: How the Deal With Hamas Was Born - Amit Segal

1 Share
  • Key figures and events in Israel-US strategy on Gaza: Involves Israeli leaders Benjamin Netanyahu, Ron Dermer, and Avi Dichter, alongside US President Donald Trump; focuses on negotiations for hostage release, military operations in Gaza, and a strike in Qatar on September 9; occurs in Israel, Gaza, Qatar, and New York; aims to eliminate Hamas, secure partial hostage deals, and establish post-war governance amid international pressures and Arab state involvement.
  • Trump's prioritization of military action: Views hostages as marginal compared to fully eliminating Hamas, leading to decisions for IDF operations in Gaza City despite initial underestimation of their strategic importance to Israel.
  • Netanyahu's internal assessments: Discusses societal scars from potential hostage losses during Gaza conquest; anticipates incomplete operations and pursues partial deals using post-Iran war momentum, rejected by Hamas.
  • Proposal for Gaza City conquest: Advocated by Avi Dichter as decisive against Hamas; pre-operation talks prompt Hamas to agree to a partial deal, but timing forces Israel into full entry.
  • Alternative governance options weighed: Considers full conquest with US-backed military government, deemed unfeasible without unity or Trump support; opts for a US-led plan supported by Arab states to neutralize purely Israeli initiatives.
  • Qatar strike and its repercussions: Involves Netanyahu, Katz, and Dermer approving attack on Hamas in Qatar without prior consultation on commitments; seen as successful in advancing deal by pressuring Qatar despite their perception of breached immunity.
  • Arab states' involvement in negotiations: Achieves commitments from Sunni countries against Hamas via US mediation, converting anger over strike into negotiation fuel; leads to pan-Arab framework excluding Palestinian Authority, with toned-down Al Jazeera and drafted apologies.
  • Post-deal Gaza reconstruction plan: Envisions two-state solution within Gaza Strip; reconstruction starts in Israeli-controlled half to leverage market forces against Hamas, with skepticism from IDF on multinational force's disarmament effectiveness.

If Israelis had heard how the President of the United States spoke about the hostages, it’s doubtful that he would have received such thunderous cheers at Hostages’ Square last Saturday night. To say they were a secondary concern for him would be an understatement — and even that understates it. Donald Trump favored eliminating Hamas the American way, and 20 living hostages (he was always confused about their number and minimized it — I wonder what Sigmund Freud would have said) seemed to him a marginal matter, collateral damage.

Only belatedly did he perceive how strategic the issue was for the Israelis, and therefore for their government as well. In the United States, presidents have usually not been criticized for meeting hostages’ families too little, but for doing so too often (for details, Google “Ronald Reagan”).

In one of the discussions before Operation Gideon’s Chariots B began, Benjamin Netanyahu spoke about the scar that would remain in Israeli society if the IDF conquered Gaza City at the cost of the hostages’ lives. Allow me to guess that he never really believed that moment would come.

Indeed, in recent months, Netanyahu and Ron Dermer’s assessment was that an operation to conquer Gaza City, if it happens, may well begin, but most certainly would not reach completion. Here is the inside story.

Following the successful war in Iran, Israel tried to use the momentum to reach a partial deal in Gaza. The idea was to release half the hostages and, during a 60-day ceasefire, arrive more or less at the conditions achieved this week. But Hamas, inspired by a Gaza starvation campaign that was gaining international traction, refused. President Trump, still in the shadow of Israel’s victory in Iran, thought the IDF could eliminate the remnants of Hamas as quickly as it smashed Tehran’s nuclear program. Ultimately, the combination of Hamas’ refusal and the president’s ambition led Israel to decide to enter Gaza City.

The idea was proposed by former Shin Bet chief and current Minister Avi Dichter: conquering the city is the end of Hamas, he said at one meeting. The magic happened almost immediately: “Even before our forces entered the city,” Dermer recounted, “three days of talk about the operation did what three months of negotiations failed to do. Hamas suddenly agreed to a partial deal. But by then time had already run out.”

Israel faced two options: one, to conquer the remainder of the strip and establish a military government with American support. Dermer and Netanyahu, however, believed that would require national unity and backing from Trump. The first component did not exist, and the second was highly unlikely.

The second option was a plan manufactured by Israel, led by the Americans, and supported by Arab states. President Reagan once told his people: you’ll write the plans, and I’ll be the presenter who markets them. This plan was no different, with Dermer filling the role of the writer. It was clear that any plan presented as purely Israeli would be pronounced dead before it was even born. That doesn’t mean every tweet was coordinated, Dermer said at a cabinet meeting this week, but on the big matters, Jerusalem and Washington moved together.

Thus began arduous negotiations with Middle Eastern countries. During a round of talks in New York, it seemed impossible to get all those elephants into the same private room. Nevertheless, Israel’s representatives returned from there with 17 substantive comments from the Sunni states and even an agreement in the offing.

Then came September 9. Early in the morning, a three-person phone call was held about the impending strike: Prime Minister Netanyahu, Defense Minister Katz, and Minister Dermer. All three supported the attack. Many issues came up in the consultation, but one particular issue did not: none of them believed there was an Israeli commitment to the Qataris not to strike Hamas personnel on their soil. Netanyahu called President Trump minutes earlier, but the president was groggy after a late night of discussions. It took time to reach him. The strike went ahead.

So far, it’s unclear how senior Hamas figures escaped the attack, but it’s obvious that it brought the deal closer. I recently wrote that it was the most successful failed assassination in history, in the sense that it signaled to the Qataris that the war would come to them if they did not stop their double game.

Dermer sees it differently. He links the strike to the agreement, but in a completely different way. The Qataris, it turns out, were convinced that by agreeing to host hostage release negotiations, they had obtained immunity from Israeli strikes on their soil. From their perspective, the strike was a blatant, offensive breach of that commitment.

Qatar hadn’t managed to help forge a deal for quite some time, but it’s not half bad at thwarting them. In Jerusalem, they called Qatar “the spoiler state” — one that can easily ruin any agreement, as it did to the Egyptian hostage release deal that was forming last spring behind its back.

Qatar is a complicated nation, Netanyahu recently said. What is it made of? In Jerusalem they describe two trains running behind the same engine. One, led by the ruler’s mother and brother, supports the Muslim Brotherhood and is an unmistakable hater of Israel. The other, led by the prime minister and several other senior figures, seeks rapprochement with the West.

Around April, a turning point was identified in Doha. Relations with the United States tightened significantly, and Hamas, an oddly patronized child, became a burden and a stain. The opportunity then presented itself following the strike in Doha, when the Arab states rushed to assemble at the emir’s conference, both in anger at Israel and fear of a blue-and-white domination of the Middle East.

The Americans’ genius was to convert that negative energy into fuel to propel negotiations to their goal. You want Israel to stop? Then let’s end the war, they told the Sunni countries, and thus enlisted them in a framework that seemed impossible: a pan-Arab, almost pan-Muslim commitment to the elimination of Hamas. Dermer drafted Netanyahu’s apology for the death of the Qatari security official; in Doha they reciprocated with a goodwill gesture by dramatically toning down Al Jazeera’s hostile tone.

More than enlisting the entire Arab world against Hamas, which had annoyed the whole region, the achievement was to enlist it for a framework that does not include the Palestinian Authority in the foreseeable future. That is, for example, what held the Emiratis back from entering Gaza a year and a half ago. In one sense, that is the great innovation: before the plan, Gaza belonged to the Palestinian Authority; now it is Arab-international until further notice. The PA, meanwhile, hates Hamas so much that it agreed.

Yes, there will be a two-state solution, Dermer said this week. But not between the river and the sea — within the Gaza Strip itself. The plan is that as long as Hamas does not disarm, reconstruction will begin — but only in the half of the strip under Israeli control. What two years of war did not accomplish will be done by market forces: where will the population feel it is better to live — amid the ruins under Hamas boots, or in a rehabilitated area with an Emirati-funded school and a trailer home for each family?

The Americans believe this is a temporary situation, and are convinced that Hamas will be disarmed soon. Israel, of course, is much more skeptical. In a recent meeting, IDF Chief of Staff Eyal Zamir made a request of the Americans: Explain to me please. Your multinational force, with a few battalions, enters a tunnel. Hamas operatives are armed there. How exactly does this disarm Hamas? Who exactly will hand over the weapons? And what if they don’t?

You didn’t believe the first phase would happen, the Americans said, believe that the second will happen too. Have a little faith, the Jews with an American flag on their lapel told the Jews with an Israeli flag.

The above is an excerpt from my Shabbat column in Israel Hayom_. Read it on_ Israel Hayom_’_s website here.

Read the whole story
bogorad
9 hours ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete

Optimizing Coding Agent Rules (CLAUDE.md, agents.md, ./clinerules, .cursor/rules) for Improved Accuracy - Arize AI

1 Share
  • Coding agents optimization: Arize team applies Prompt Learning to Cline, an open-source coding agent, optimizing its rules files to boost accuracy on SWE-bench lite benchmark with 300 Python GitHub issue pairs; conducted in 2025 via experimental servers and evals, aimed at enhancing developer control and agent performance without model fine-tuning.
  • System prompt reliance: Capable coding agents like Cursor, Claude Code, and Devin use single persistent system prompts defining tools, formats, and policies for stable state maintenance and continuous reasoning.
  • Rules files exposure: Agents provide user-defined rules appended to system prompts for configuring security, style, architecture, and testing behaviors.
  • Prompt Learning definition: Algorithm inspired by reinforcement learning uses meta prompting and English feedback to iteratively refine prompts based on dataset evaluations.
  • Optimization process: Involves initializing baseline prompt, generating outputs, evaluating with unit tests, creating detailed feedback via GPT-5, and meta-prompting for new rulesets over train-test split of SWE-bench.
  • Cline configuration: Runs in ACT mode on servers with GRPC requests to repositories, producing patches evaluated for pass/fail using unit tests and generating explanations for why approaches succeeded or failed.
  • Meta prompt instructions: Guides LLM to edit ruleset for robustness, focusing on general improvements like preserving correctness, handling edges, avoiding quick fixes, and ensuring compatibility without user input.
  • Results and takeaways: Optimization yields 10-15% accuracy gains, closing performance gaps between models like GPT-4.1 and Claude Sonnet 4-5 on SWE-bench via rules alone, tracked using Phoenix Experiments.

Coding agents have become the focal point of modern software development. Tools like Cursor, Claude Code, Codex, Cline, Windsurf, Devin, and many more are revolutionalizing how engineers write and ship code.

A consistent pattern among the most capable agents is their reliance on a single, persistent system prompt rather than a chain of sub-prompts. Each session starts with a comprehensive base prompt defining available tools, I/O formats, and safety policies. From there, all messages build on the same context. This design lets the model maintain state through its context window, use tool outputs as implicit memory, and reason continuously without an external controller — favoring simplicity, stability, and tight feedback loops.

That single, often large, system prompt effectively governs the agent’s behavior. It’s a major determinant of performance, and its precision and scope directly influence how reliably the agent performs complex coding tasks.

To give developers more control, most coding agents expose rules files — user-defined instruction sets that are appended directly to the system prompt. These files let users configure behavior around security and compliance, programming style, architectural patterns, testing practices, and more by directly appending structured guidance to the base prompt.

Rules are a hot topic; it feels like I see people share their new coding agent rules on X every day. This is largely because writing effective rules is difficult. How do you know if a rule actually helps? How do you measure its impact or trust that it improves the agent’s reasoning, tool calling, data retrieval, etc.?

To answer this, we used Prompt Learning, our prompt-optimization algorithm, to automatically generate and refine rule files. We applied it to Cline, a powerful open-source coding agent, chosen for both its strong baseline performance and full transparency — including access to its prompts, tool descriptions, and system configuration — which make it ideal for optimization.

Keep reading to see how we improved Cline’s accuracy by 10-15%, just by optimizing Cline rules.

What Is Prompt Learning?

Prompt learning is an optimization algorithm designed to improve prompts, inspired by reinforcement learning. It employs a similar [action -> evaluation -> gradient] framework, modified and optimized for prompts. It’s a method of realizing improvements and gains for your LLM applications and agents by optimizing their prompts, rather than having to fine-tune or train the underlying LLMs.

Key differences between traditional rl and prompt learning

Instead of something like gradient descent, the prompt optimization technique that prompt learning uses is called meta prompting. Meta prompting is the simple concept of feeding a prompt into an LLM, and asking it to improve the prompt. You can also provide the meta prompt LLM with context that shows how the original prompt performs. This often includes input/output pairs, along with labels/scores on those outputs, such as whether those outputs were correct or incorrect.

This approach has one major limitation – the labels/scores/scalar reward provides very low information content to the meta prompt LLM about the individual data points. The LLM does not know WHY certain outputs were correct/incorrect, where improvement is needed, or any specific nuance about those datapoints.

Prompt learning completes the algorithm by using LLMs to generate rich english feedback for the meta prompt. This helps guide the meta prompt LLM to producing better prompts.

  1. Initial Prompt: Start with a baseline prompt that defines your task
  2. Generate Outputs: Use the prompt to generate responses on your dataset
  3. Evaluate Results: Run evaluators to assess output quality
  4. Optimize Prompt: MetaPrompt(initial prompt, data, feedback) -> optimized prompt
  5. Iterate: Repeat until performance meets your criteria

Evaluating Improvement with SWE Bench

For reference, see the SWE bench paper and the SWE Bench leaderboard.
We specifically used prompt learning to optimize Cline, through its rules. In order to track Cline’s improvement across optimizations, we used SWE Bench. SWE-bench is a benchmark designed to evaluate a system’s ability to automatically resolve real GitHub issues. It contains 2,294 issue–pull request pairs from 12 popular Python repositories. Each system’s output is evaluated through unit test verification, using the post–pull request repository state as the reference solution. It’s a popular and widely adopted benchmark for coding automation tasks.

We specifically used SWE-bench lite, which includes 300 issue-pull request pairs.

The Optimization Loop

Initialize – Train/Test

We split SWE-bench Lite into a 50/50 train–test split with 150 examples each — enough data to drive meaningful optimization but also enough data to have sufficient test coverage for reliable measurement of improvement.

Training Phase

Running Cline

Cline was booted up on individual servers. We used Cline’s standalone server implementation, which allows us to send requests to Cline servers through GRPC, pointing them at the SWE bench repositories.

Cline was set to ACT Mode – where its given full access to read and edit code files as it deems fit.

We configured each Cline run with the current ruleset by funneling it through .clinerules. Ruleset is initialized to empty, and optimized after each training phase.

your-project/
├── .clinerules
├── src/
├── docs/
└── …

After giving Cline ample time to run, we used git diff to generate cline_patch, or the patch it produced.

Testing Cline

We then use SWE-bench package to run the unit tests for each SWE bench row after Cline’s edits, which tells us whether Cline was correct/incorrect.

Evaluating Cline – Generating Feedback/Evals

Evals are a crucial part of the optimization loop – they serve as powerful feedback channels for our Meta Prompt LLM to generate an optimized prompt. The stronger the evals are, the stronger the optimization should be.

We used GPT-5 to compare cline_patch with patch (ground truth patch) from SWE bench, asking it to tell us WHY cline_patch was right/wrong, and WHY the model may have taken the direction it did.

Python

Copy Code
Copied
Use a different Browser

You are an expert software engineer, tasked with reviewing a coding agent.

   You are given the following information:
   - problem_statement: the problem statement
   - cline_patch: a patch generated by the coding agent, which is supposed to fix the problem.
   - patch: a ground truth solution/patch to the problem
   - test_patch: a test patch that the coding agent's output should pass, which directly addresses the issue in the problem statement
   - pass_or_fail: either "pass" or "fail" indicating whether the coding agent's code changes passed the unit tests.

   Your task is to review the given information and determine if the coding agent's output is correct, and why.
   Evaluate correctness based on the following factors:
   - Whether cline_patch fixes the problem.
   - Whether test_patch would pass after applying cline_patch.
   - Whether the coding agent is taking the correct approach to solve the problem.

   You must synthesize why the coding agent's output is correct or incorrect. Try to reason about the coding agent's approach, and why the coding agent may have taken that approach.
  
   problem_statement: {problem_statement}
   ground truth patch: {patch}
   test patch: {test_patch}
   coding agent patch: {cline_patch}
   pass_or_fail: {pass_or_fail}

   Return in the following JSON format:
   "correctness": "correct" or "incorrect"
   "explanation": "brief explanation of your reasoning: why/why not the coding agent's output is correct, and why the coding agent may have taken that approach."
   """

Meta Prompt

To complete the training phase, we use everything we have generated as input into a Meta Prompt LLM, prompting it to generate a new ruleset.

Python

Copy Code
Copied
Use a different Browser

You are an expert in coding agent prompt optimization. 
Your task is to improve the overall ruleset that guides the coding agent. 

Process:
1. Review the baseline prompt, the current ruleset, examples, and evaluation feedback. 
2. Identify high-level shortcomings in both the baseline prompt and the ruleset — look for missing guidance, unclear constraints, or opportunities to strengthen general behavior. 
3. Propose edits that make the ruleset more robust and broadly applicable, not just tailored to the given examples. 

BELOW IS THE ORIGINAL BASELINE PROMPT WITH STATIC RULESET
************* start prompt ************* 

{baseline_prompt} 
************* end prompt ************* 

BELOW IS THE CURRENT DYNAMIC RULESET (CHANGE THESE OR ADD NEW RULES)
************* start ruleset ************* 

{ruleset}
************* end ruleset ************* 

BELOW ARE THE EXAMPLES USING THE ABOVE PROMPT 
************* start example data ************* 

{examples} 
************* end example data ************* 

FINAL INSTRUCTIONS 
Iterate on the current ruleset. You may: 
- Add new rules 
- Remove rules
- Edit or strengthen existing rules
- Change the ruleset entirely, if need be 

The goal is to produce an optimized dynamic ruleset that generalizes, improves reliability, and makes the coding agent stronger across diverse cases — not rules that only patch the specific examples above. 
The rules in the baseline prompt are static, don't change them. Only work on the additional dynamic ruleset.

Please make sure to not add any rules that:
   - ask for user input
   - use ask_followup_question
Cline is to perform without user input at any step of the process.
  
Return just the final, revised, dynamic ruleset in bullet points. Do not include any other text.

New ruleset:

Even though we focused solely on ruleset optimization, we provided both Cline’s full prompt and its current ruleset to give the Meta Prompt LLM complete context. The {examples} dataset included input–output pairs, correctness labels, and detailed evals explaining why each case succeeded or failed. Crucially, the Meta Prompt LLM was instructed to generate broadly applicable rules that strengthen overall agent performance — not rules that overfit to specific training examples.

Test Phase

After the optimized ruleset is generated, we boot up Cline servers on the 150 test examples, testing out the optimized ruleset, and generating an accuracy metric (% of test examples where Cline’s patch passed all unit tests).

Final Ruleset

Here are some examples of rules that got added to the ruleset. The final optimized ruleset tends to contain anywhere from 20-50 rules.

  • Ensure every code modification strictly preserves correctness, minimality of change, and robustly handles edge/corner cases related to the problem statement—even in complex, inherited, or nested code structures.
  • Avoid blanket or “quick fix” solutions that might hide errors or unintentionally discard critical information; always strive to diagnose and address root-causes, not merely symptoms or side-effects.
  • Where input normalization is necessary—for types, iterables, containers, or input shapes—do so only in a way that preserves API contracts, allows for extensibility, and maintains invariance across all supported data types, including Python built-ins and major library types.
  • All error/warning messages, exceptions, and documentation updates must be technically accurate, actionable, match the conventions of the host codebase, and be kept fully in sync with new or changed behavior.
  • Backwards and forwards compatibility: Changes must account for code used in diverse environments (e.g., different Python versions, framework/ORM versions, or platforms), and leverage feature detection where possible to avoid breaking downstream or legacy code.
  • Refactorings and bugfixes must never silently discard, mask, or change user data, hooks, plugin registrations, or extension points; if a migration or transformation is required, ensure it is invertible where possible and preserve optional hooks or entry points.

Results

We used Phoenix Experiments to track our Cline runs at each level of optimization. Phoenix Experiments are a great way to test out your LLM applications over datasets, run evals on those experiments, and track your experiments in one central location over time.

Claude Sonnet 4-5

Sonnet 4-5 saw a 6% boost in training accuracy and a 0.67% gain in test accuracy using prompt learning

claude sonnet coding accuracy results

GPT 4.1

GPT 4.1 prompt learning coding accuracy boost

Final Takeaways

Sonnet 4-5 is already near the ceiling of coding-agent performance on SWE-bench — a state-of-the-art model that leaves little headroom for further gains. GPT-4.1, on the other hand, had more room to improve. Through ruleset optimization alone, Prompt Learning was able to close much of that gap, bringing GPT-4.1’s performance close to Sonnet-level accuracy without any model retraining, architectural changes, or additional tools.

Read the whole story
bogorad
1 day ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete

(2,358) NewsBlur

1 Share
  • Overview of Report: Robert VerBruggen of the Manhattan Institute analyzes gunshot-detection technology (GDT) like ShotSpotter in a new report, focusing on its use in about 170 U.S. cities including Chicago and Kansas City, to evaluate evidence on effectiveness and costs amid ongoing controversies and discontinuations.
  • Technology Function: ShotSpotter deploys acoustic sensors to detect loud noises identified as gunfire, providing audio and location alerts to police via mobile app for coordinated response.
  • Response Benefits: Alerts enable faster police arrival than 911 calls, aiding in locating victims, witnesses, suspects, and collecting evidence like shell casings even without reports.
  • Investigative Outcomes: Studies by Eric Piza show quicker responses and more evidence collection at ShotSpotter scenes but no significant improvements in case clearances or crime reductions in Chicago and Kansas City.
  • Implementation Factors: Dennis Mares' research indicates varied results, with weaker effects in cities like St. Louis due to slow responses and limited investigations, better in Winston-Salem with intense follow-up.
  • Resource Criticisms: Alerts often yield no physical evidence in over half of cases, possibly due to false positives, revolvers, or missed casings, straining officer time without guaranteed findings.
  • Racial Deployment Concerns: Sensors placed in high-gun-violence areas overlapping with minority neighborhoods show no added racial disparities in enforcement, per Piza's Chicago analysis and city maps.
  • Cost and Recommendations: Annual costs range from $65,000 to $90,000 per square mile, under 1% of police budgets, suitable for well-staffed departments integrating with crime centers, but not for those lacking capacity.

Facial recognition. Drones. Police have adopted a range of new technologies in recent years to help prevent and respond to crime.

Yet some of the most intense controversies still swirl around a product that’s been around for decades: gunshot-detection technology (GDT), most prominently ShotSpotter, which now operates in roughly 170 cities.

Finally, a reason to check your email.

Sign up for our free newsletter today.

First Name*

Last Name*

Email*

Sign Up

This site is protected by hCaptcha and its Privacy Policy and Terms of Service apply.

Thank you for signing up!

ShotSpotter uses an array of acoustic sensors to listen for loud noises, identify those likely to be gunfire, and alert law enforcement. Despite the system’s straightforward premise, several cities—most notably Chicago—have discontinued its use, citing concerns about effectiveness and racial disparities.

In a new report for the Manhattan Institute, I sift through the evidence regarding GDT’s use, effectiveness, and cost. It turns out that these systems are neither a racist ploy to surveil minority communities nor a panacea for gun violence. Rather, they improve gunshot investigations at the margins, while costing cities some money and officer time.

Cities should weigh these tradeoffs carefully and ask whether they have the capacity to respond to and investigate GDT alerts. But they shouldn’t dismiss the technology outright.

When ShotSpotter detects gunfire, it transmits the audio and estimated location to police. Officers—who can view the alert directly through a mobile app—and dispatchers then coordinate a response.

The premise is simple: with rapid notification and precise coordinates, police can reach the scene faster. This can help them locate witnesses or victims in need of aid, apprehend fleeing suspects, and collect evidence such as shell casings—even when no one calls 911 to report the incident.

So far, so good. “Cops tend to get out there quicker on ShotSpotter calls,” Eric Piza, a Northeastern University criminologist who has studied the technology’s implementation in Chicago and Kansas City, told me. “ShotSpotter calls come in sooner than 911 calls about gunfire. Police collect more evidence when they’re on ShotSpotter scenes.”

The big question is whether departments can translate these investigative benefits into case clearances and crime reductions. In both Chicago and Kansas City, Piza has found no measurable improvement in these longer-term goals. Other studies are mixed.

How the technology is used matters. In departments that are short-staffed or lack the infrastructure to process evidence quickly, it’s not surprising that the effects would be small and debatable. Chicago is experiencing a police staffing crisis, for instance. Data for 2024 published by the city’s police department showed quicker response time to ShotSpotter calls than to 911 shooting calls—but in both cases, responses took more than ten minutes on average.

Dennis Mares, a criminal-justice professor and director of the Center for Crime Science and Violence Prevention at Southern Illinois University Edwardsville, is another prominent GDT researcher. He’s found results both disappointing (in St. Louis) and more encouraging (in Winston-Salem). “I think a lot of the weaker results are seen in cities in which the response to ShotSpotter alert wasn’t as intense as it should be—where the response speed was low, where there’s not a lot of investigation or people picking up shell casings,” he said.

Photo by Yi-Chin Lee/Houston Chronicle via Getty Images

Jim Burch, president of the National Policing Institute, noted that departments vary widely in how they respond to alerts. “Obviously, everybody’s going to have officers respond when there’s gunfire detected,” Burch said. “But it branches out from there. It’s a question of whether officers are required to stop and get out and canvas the scene looking for evidence, victims, etc.—or whether they can simply go to an area and observe, and if they don’t see people, if they don’t see evidence, there’s no requirement for a canvassing of the scene. In some situations, we’ve learned that there are agencies where officers are simply too busy.”

At its best, ShotSpotter can be a part of a broader data-driven crime-fighting infrastructure. In recent years, many cities have implemented Real-Time Crime Centers, which provide intelligence to police based on cameras and other tech. Some have also launched Crime Gun Intelligence Centers (CGICs), which rapidly process ballistic evidence, such as shell casings, and can link rounds fired from the same gun across multiple crimes.

Research on Detroit’s CGIC from Alaina De Biasi, a Wayne State University criminologist, suggests that promptly analyzing shell casings can boost clearance rates. “Where gunshot detection comes into play here is that it increases the pool of evidence that we’re putting in,” De Biasi told me. She also stressed, however, that departments can’t see this benefit if they don’t have the staff to respond to new calls and the capacity to process ballistic evidence quickly.

That brings us to one of the biggest criticisms of ShotSpotter: it generates a large number of alerts for cops, many of which don’t lead to physical evidence. In a great many cases—typically more than half, across several cities with available data—officers fail to find physical evidence of gunfire when they arrive, especially if the incident involved few shots and didn’t generate a 911 call.

It’s hard to say how many such incidents are false positives—situations where something besides a gunshot triggered the system. In some cases, a perpetrator may have used a revolver or picked up his shell casings, or police may simply have failed to find the casings. Either way, departments should know that they won’t discover evidence at every ShotSpotter-alerted site.

The next major criticism of ShotSpotter involves race: the system is disproportionately deployed in minority neighborhoods. But no good evidence indicates that this reflects bias, as opposed to departments choosing their coverage areas based on concentrations of gun violence—which tend to overlap with minority, especially black, neighborhoods.

Piza has found that “Chicago’s GDT system did not create additional racial disparities in arrests and stops beyond those already present in standard police responses to gunfire.” And in the report, I provide maps of New York, Chicago, and Miami depicting the similarities across gun-violence concentrations, sensor placement, and black population share.

Of course, departments should have buy-in from the communities where they deploy GDT and respond to alerts in an appropriate manner. Ralph Clark—CEO of SoundThinking, the company behind ShotSpotter—told me departments should take a “guardian” rather than “warrior” approach. “We love to see a best practice implemented that after you get to a ShotSpotter alert, you might knock on a couple of doors and let people know that you’re there because of a ShotSpotter alert, because you’re prioritizing the care and support of that particular neighborhood or community,” he observed.

Finally, there is the price tag. ShotSpotter costs about $65,000 to $90,000 per square mile annually and consumes officer time and resources that could be spent on other tasks. These are serious tradeoffs, though bearable in the context of a big-city police department.

ShotSpotter contracts often cost less than 1 percent of a city’s police budget, or $1 to $3 per resident. Generally, a small percentage of officers’ overall time is spent responding to GDT alerts. These costs will be felt more acutely in departments facing budget or staffing problems that can’t handle even their existing workload.

When assessing GDT, Americans should be neither bowled over by fancy tech nor cowed by activist pressure. GDT can be a worthwhile investment for departments capable of using it well, and researchers should continue to study which aspects of implementation are most important. Departments that lack the staff to respond to calls or the infrastructure to process evidence, however, might want to address those problems first.

Robert VerBruggen is a fellow at the Manhattan Institute.

Top Photo Jane Tyska/Digital First Media/East Bay Times via Getty Images

Read the whole story
bogorad
2 days ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete

Inside New York’s Radical Protests on October 7 // I went behind enemy lines with “Behind Enemy Lines.”

1 Share
  • Protest Coverage: On October 7, 2024, in New York City's South Bronx and Manhattan, investigative reporter Ryan Thorpe embedded with the anti-imperialist group Behind Enemy Lines (BEL) to expose their activities commemorating the Hamas-led attack on Israel.
  • Activist Profile: BEL organizer Adrian, a New York-born activist with Cuban roots who lived in Cuba and founded an anti-capitalist group in Chicago, rallied passersby in the South Bronx against U.S. imperialism and Israel's Gaza operations.
  • Group Actions: BEL members distributed flyers, posted stickers at the rundown Mitchel Houses public housing project, and linked a recent boiler explosion there to alleged Gaza destruction, while avoiding provocative slogans to evade FBI scrutiny.
  • Background Ties: BEL co-organized a violent protest at the 2024 Democratic National Convention with the U.S.-restricted Samidoun network, resulting in 56 arrests and injured police, and promotes Maoist literature like the Organization of Communist Revolutionaries' journal.
  • Bronx Event: A small BEL "speak-out" near Mitchel Houses featured accusations of Israeli genocide, red paint splattered on politicians' photos, and low public engagement, with one onlooker preferring to discuss UFOs.
  • Consulate Protest: About 25 masked BEL protesters clashed with NYPD outside the Israeli Consulate, chanted pro-Palestinian slogans, burned U.S. and Israeli flags, and saw two arrests after a brief 40-minute demonstration.
  • Larger March: Around 200 unpermitted protesters, waving Hamas and Hezbollah flags, blocked Manhattan traffic, chanted "Globalize the Intifada," harassed a Jewish woman holding an Israeli flag, and distributed a mock New York Times endorsing PFLP figures.
  • Urban Markings: The unsanctioned march ended near Trump Tower with street prayers and left graffiti declaring "DEATH TO THE IDF, DEATH TO AMERIKKKA" on public structures, highlighting escalating radical rhetoric.

Meet Adrian. With round wire-rim glasses, shaggy brown hair, and a beard, he looks like a throwback to the student radicals of the 1960s. His politics are a throwback, too. Standing on a sidewalk in the South Bronx, around the corner from the Mitchel Houses project on Alexander Avenue near East 135th Street, he speaks of war crimes and genocide, of the sins of capitalism and imperialism. He calls on those in the “belly of the beast” need to become seditious and “side with the people of the world” against the United States of America.

It’s a hot Saturday afternoon, and Adrian wears a t-shirt featuring a photo of Joe Biden, above the words “War Criminal.” In one hand he holds a microphone, in the other a stack of flyers for a protest at the Israeli Consulate in New York on October 7. He calls that date the second anniversary of the “Al-Aqsa flood,” (Hamas’s term), when the “Palestinian resistance . . . broke through the apartheid wall and took the fight to their enemies.”

Finally, a reason to check your email.

Sign up for our free newsletter today.

First Name*
Last Name*
Email*
Sign Up
This site is protected by hCaptcha and its Privacy Policy and Terms of Service apply.
Thank you for signing up!

This afternoon, no one is listening to Adrian, but he remains undeterred, railing against Israel’s “genocide” in Gaza and denouncing the “complicity” of the U.S. government. Occasionally, as people walk down the sidewalk, he forces a flyer into their hands.

Adrian is an organizer with Behind Enemy Lines (BEL), a small but growing “anti-imperialist” group with chapters in Chicago and New York and members scattered across the country. This is how he has spent nearly every Saturday for the past ten months. BEL’s highest-profile action to date was a protest at the 2024 Democratic National Convention in Chicago, at which 56 people were arrested—including Adrian—and two police officers injured. BEL co-organized the protest with Samidoun, a “sham charity” that funnels funds to the Popular Front for the Liberation of Palestine (PFLP). Samidoun is banned in Canada and is under U.S. government restriction.

I first contacted BEL in mid-September after seeing its call for nationwide protests on the anniversary of October 7. I told them I was a university student in New York who was interested in joining but took no additional steps to conceal my identity. Days after reaching out, I spoke on the phone with a BEL organizer in Chicago named Michael, who told me the group was hoping to use October 7 to “spark s**t again.” A week later, I was on the phone with Adrian, who said: “For us, the bigger question is always: What are we doing to get people in this country to defect from their loyalty to this thing?”

On October 4, I met Adrian and his comrades in the South Bronx. That afternoon, we spent several hours at the Mitchel Houses project, a series of brick high-rise apartment buildings that opened in 1964, and where residents have long complained about “garbage strewn hallways” and infestations of rats, mice and roaches. There, we put up stickers and posters, handed out flyers, and tried to speak with residents.

One of the posters BEL had planned to distribute read: “Up Against the Wall Motherfucker!” but before heading out, Adrian instructed his comrades to throw it in the trash. When I asked why, he mentioned the death of “a certain right-wing podcaster.”

“It could be construed as a threat,” another member said, adding that the group wanted to avoid a visit from the FBI.

A few days earlier, a boiler explosion at the Mitchel Houses caused a partial building collapse, leaving a 20-story gash in the side of an apartment tower. The BEL activists seized on the explosion to push their message: This is what’s been happening in Gaza every day for two years, and the people responsible for what’s happening there are the same people responsible for what’s happening here.

“The next time Ritchie Torres turns up here,” Adrian said of the Zionist Bronx congressman, “I hope the people run him out.”

Adrian told me that he was born in New York City to a Cuban mother and a father who worked for the United Nations. For a few years, his family had lived in Cuba. They eventually returned to the U.S., where he attended college in Chicago and helped found an anti-capitalist group (I’ve forgotten its name) that now has eight chapters across the country. More recently, after returning to New York, he began organizing with BEL.

Before leaving, I purchased a copy of Going Against the Tide, the Organization of Communist Revolutionaries’ theoretical journal, which BEL members had been hawking on the sidewalk. I was their lone sale of the day. The slim, 163-page volume retailed for $20, though Adrian had offered to sell me three issues for the price of two.

OCR is a self-styled Maoist vanguard party that bars its members from acknowledging their membership to outsiders without leadership’s approval. On the subway ride back to my hotel, I read a few of the articles. One claimed ICE is composed of “fascist thugs”; the other argued that capitalism is perpetrating a “slow genocide against trans people.”

“You don’t have to be a communist to join Behind Enemy Lines,” Adrian told me at the end of the afternoon. “But I am one.”

On October 7, I returned to the South Bronx to attend the “speak out” that BEL organized near the Mitchel Houses project. Aside from the BEL members themselves, turnout was small. As a speaker accused Israel of turning Gaza into a “dystopian, post-apocalyptic wasteland,” a BEL member approached a man who had stopped to watch.

“Would you like to take the mic and say something about the genocide in Gaza?” the member asked.

“If I take the mic, the only thing I wanna talk about is UFOs,” the man said.

To cap off the speak-out, BEL members splattered red paint, symbolizing blood, on photographs of various U.S. politicians, both Democratic and Republican, and on one of Israeli president Benjamin Netanyahu.

Around 4 p.m., roughly 25 protesters affiliated with BEL, many with keffiyehs or surgical masks covering their faces, arrived at the Israeli Consulate in Manhattan. Instead of gathering at the front of the building as planned, they were forced to protest at the corner of the intersection after the NYPD blocked off the sidewalks. A scuffle soon broke out between demonstrators and police, with two BEL members led away in handcuffs.

For 30 minutes, fiery speeches and pro-Palestinian chants emanated from the crowd. Hundreds of New Yorkers moved down the sidewalks, gawking at the curiosity.

“Free Palestine! Free Palestine!” mimicked one passerby, mocking the chant in a high-pitched voice. “F**k off,” he added in a thick New York accent.

BEL members closed the protest by burning American and Israeli flags, symbolizing their “pledge of allegiance to the people of the world.” Then, it was over: After months of preparation and a public pledge to “[hit] the Israeli Consulate with all we’ve got,” after vowing to “escalate for Gaza” and do “whatever it takes to bring this genocide to an end,” the whole affair lasted roughly 40 minutes.

A few blocks away, about 200 pro-Palestinian protesters converged at the News Corp building in Manhattan. Many appeared to be college students; others were young men with their faces covered, either by keffiyehs or balaclavas. Some were openly hostile to outsiders, seemingly itching for a fight. Directly in front of the News Corp building, from the middle of the crowd of protesters, a Hamas flag waved in the wind.

As the sun began fading from the sky, the protesters pushed past the NYPD barricades and into the streets. They blocked traffic as they marched through Manhattan, holding banners calling for revolution and placards glorifying “martyrs.” They waved flags for Palestine, Hamas, and Hezbollah. They chanted “Globalize the Intifada!” They did not have a permit, but their unsanctioned march received an escort from the NYPD, as frustrated motorists waited out the passing crowd.

Not far from Radio City Music Hall, a middle-aged Jewish woman walked up a cross street and silently held an Israeli flag. Within seconds, she was swarmed by masked men and keffiyeh-clad women.

“You’re a baby killer! A baby killer!” one woman screamed into her face. “You starved our kids! You’re going to feel it! You’re going to feel it! That disgusting nation!” When the police intervened, they seemed to tell the Jewish woman, rather than those harassing her, to leave.

The marchers made their way to Trump Tower, where an Islamic prayer was held in the street. I stopped at a bus shack to sit down. As the protesters walked by, a woman handed me a newspaper, which appeared to be a copy of The New York Times.

As I unfolded the paper, I realized the banner read The New York War Crimes. On the front page was a quote from Habib Qahwaji, a former executive committee member of the Palestine Liberation Organization. On page 11 was an address from Georges Abdallah, a former member of the PFLP, who served four decades in prison for the assassinations of a U.S. military attaché and an Israeli diplomat.

I folded the paper back up as chants of “Globalize the Intifada” echoed through the streets of downtown New York. That’s when I noticed fresh graffiti on the wall of the bus shack. The graffiti read: “DEATH TO THE IDF, DEATH TO AMERIKKKA.”

Ryan Thorpe is an investigative reporter at the Manhattan Institute. Adam Lehodey and Stu Smith contributed reporting to this article.

Photos: Courtesy of the author

Read the whole story
bogorad
2 days ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete

The Frothiest AI Bubble Is in Energy Stocks - WSJ

1 Share
  • Article overview: Wall Street Journal piece by Jinjoo Lee examines rising valuations of zero-revenue energy companies driven by AI power demands; published recently amid 2024 market trends in the U.S., highlighting potential risks compared to profitable tech firms.
  • Oklo's valuation surge: Sam Altman-backed nuclear startup Oklo's shares rose eightfold year-to-date, reaching a $26 billion market cap as the largest U.S. public company with no past 12-month revenue.
  • Oklo's technology and status: Develops small modular reactors using liquid metal sodium coolant and limited-supply enriched uranium; lacks U.S. Nuclear Regulatory Commission license and binding power purchase contracts, with revenue not expected until 2028.
  • Fermi's public debut: Zero-revenue energy firm Fermi valued at $19 billion upon listing this month, backed by former energy secretary Rick Perry and led by Toby Neugebauer of failed GloriFi; plans 11 gigawatts for data centers but secured only 5% capacity so far, no customer contracts.
  • Micro-modular reactor firms: Nano Nuclear Energy shares doubled this year to over $2 billion valuation; Terra Innovatum reached over $1 billion via SPAC merger last week, both lacking revenue.
  • Revenue-generating but unprofitable peers: NuScale Power shares up 155% with engineering fees from Romania project, no profits until 2030 expected; Plug Power shares surged 90% to $4.8 billion on AI hype, also unprofitable until 2030.
  • Investor motivations and comparisons: Speculative energy plays attract funds due to high multiples on profitable firms like Bloom Energy at 133 times forward earnings; Centrus Energy at 99 times, amid AI excitement.
  • Historical risks: Parallels to 2020 zero-revenue EV startups like Nikola and Fisker that faltered; warns energy firms could fall sharply if AI demand wanes, lacking revenue cushions.

Sam Altman, CEO of OpenAI, speaks during a data-center tour.

Sam Altman, CEO of OpenAI, has backed zero-revenue energy company Oklo. Kyle Grillot/Bloomberg

Forget about the froth in tech valuations. The real excess might be building up in energy stocks. 

For all the fears about stretched technology shares, many of those companies are hugely profitable ones that will keep chugging along even if the artificial-intelligence boom doesn’t have legs. Not so in the energy sector. A group of non-revenue-generating energy companies have collectively ballooned in value to more than $45 billion in hopes that tech companies will one day pay for their yet-to-be-built power.

The biggest of these is the OpenAI CEO Sam Altman-backed nuclear startup Oklo OKLO -4.50%decrease; red down pointing triangle, whose shares have risen about eightfold year to date. The company now has a market cap of roughly $26 billion, making it the biggest U.S.-incorporated public company that generated no revenue in the past 12 months, according to data from S&P Global Market Intelligence.

Oklo is developing small modular nuclear reactors that use a non-water coolant—liquid metal sodium—and an enriched type of uranium fuel that is in limited supply. It doesn’t yet have a license from the U.S. Nuclear Regulatory Commission or binding contracts with power purchasers. Wall Street analysts don’t expect the company to generate substantial revenue until 2028. 

Another zero-revenue company is Fermi FRMI -3.92%decrease; red down pointing triangle, which was valued at roughly $19 billion upon its public debut earlier this month. Only two other no-revenue companies had larger market caps than Fermi on their first day of trading after an IPO, adjusted for inflation, according to Jay Ritter, finance professor at the University of Florida. These are EV-maker Rivian, which went public in 2021, and Corvis, an optical network equipment maker that went public during the dot-com bubble.

SHARE YOUR THOUGHTS

Do you think investing in zero-revenue energy companies is worth the risk? Join the conversation below.

The company is backed by former energy secretary Rick Perry and helmed by Toby Neugebauer, the former chief executive of the failed anti-woke bank startup GloriFi. It has plans to build out 11 gigawatts worth of power for data centers, roughly the amount of capacity in New Mexico. Though its shares haven’t sustained their initial pop after listing, the company still commands a market capitalization of over $17 billion. That isn’t too far from the valuation of Talen Energy, a company that already owns an operating power fleet of about 11GW.

Fermi plans to meet that 11GW target using natural gas, nuclear, solar and battery power. It has a way to go: So far, it has secured natural-gas equipment that would cover just 5% of its total capacity goal. The company hasn’t lined up any binding customer contracts.

Companies developing even smaller “micro-modular” nuclear reactors are also commanding hefty market caps despite their lack of revenue. Shares of Nano Nuclear Energy NNE -5.47%decrease; red down pointing triangle, which made its debut on the public markets last year, have more than doubled so far this year. The company is valued at more than $2 billion. Terra Innovatum NKLR -8.68%decrease; red down pointing triangle, which went public last week through a SPAC merger, is valued at over $1 billion.

Others swept up in the AI excitement generate revenue but aren’t expected to turn a profit for many years. Such companies include nuclear small modular reactors company NuScale Power SMR 11.18%increase; green up pointing triangle, which earns some engineering and licensing fees for an SMR project in Romania. Its shares have surged 155% so far this year. Hydrogen fuel-cell company Plug Power’s PLUG -7.78%decrease; red down pointing triangle shares, which had been in the gutter for many years, surged 90% this year to $4.8 billion on AI excitement. Neither company is expected to turn a profit until 2030, according to Wall Street analysts polled by FactSet.

One reason investors are piling into more speculative energy companies could be because profit-generating ones already command lofty multiples. Fuel-cell company Bloom Energy’s BE 5.02%increase; green up pointing triangle shares have rallied more than 400% year to date and are now valued at 133 times forward earnings. The company added about $5.4 billion in market cap on Monday after Brookfield Asset Management BAM -0.54%decrease; red down pointing triangle said it would invest up to $5 billion to deploy Bloom’s technology. Nuclear-fuel company Centrus Energy LEU -1.14%decrease; red down pointing triangle is valued at 99 times forward earnings.

Arguably, more commercial interest might be just what was needed to help expensive or unproven technologies take off. But based on the track record of zero or minimal revenue EV startups that went public in 2020, (remember Nikola, Fisker and Lordstown?), it is likely that many such companies will fizzle rather than pop.

If the AI bubble ever deflates, these energy companies with no revenue have the farthest to fall and little in the way of a cushion.

Write to Jinjoo Lee at jinjoo.lee@wsj.com

Navigating the Markets

Coverage and analysis, selected by editors

Get WSJ's Markets A.M. Newsletter

Working-Class Americans Go Big on Stocks Working-Class Americans Go Big on Stocks

Worries About Trade Wars Return Worries About Trade Wars Return

Corporate Bonds Are on a Tear Corporate Bonds Are on a Tear

New Trade Powers Gold, Hits Dollar New Trade Powers Gold, Hits Dollar

Why Biotech’s Rally Can Last This Time Why Biotech’s Rally Can Last This Time

ETFs Flush With Money; More Coming ETFs Flush With Money; More Coming

Stock Funds Hold 11% Gain for 2025 Stock Funds Hold 11% Gain for 2025

Bubbles Come Less Often Than Thought Bubbles Come Less Often Than Thought

Read the whole story
bogorad
2 days ago
reply
Barcelona, Catalonia, Spain
Share this story
Delete
Next Page of Stories