Safe Is What We Call Things Later - by Scott Werner
Sunday August 24^th, 2025 at 9:08 AM

Works on My Machine

Who/What/When/Where/Why: Observer at the edge of the Atlantic last weekend watched a seven-year-old angry at tide pools and used the scene to illustrate how software engineering repeatedly swings between mutable experimentation and formal reliability.
Two programmer types: A split between formalist (teach inheritance first, programs-as-proofs, compile-time safety) and informalist (teach polymorphism first, programs-as-conversations, runtime mutability).
Smalltalk / Alan Kay: Smalltalk at Xerox PARC treated code as a conversational, living system intended to train programmers to think in systems and runtime change.
Dijkstra / Formalism: Dijkstra promoted programming as applied mathematics requiring proofs, structured control flow, and predictability for safety-critical domains.
Translation to C++/Java: C++ and Java translated Smalltalk’s shapes into static, compiled constructs—methods, classes, heavy typing—trading dynamism for team-scale reliability.
Web rebellion: JavaScript and Ruby/Rails revived informal, rapid prototyping for the web, enabling creativity and speed but producing fragile, chaotic systems.
Tidying / Return swing: TypeScript, Go, Rust and similar efforts reintroduced constraints, static checks, and safety to tame the chaos of informal web-era systems.
AI & the pendulum: AI enables extreme informality (self-modifying agents, model-driven production code) with projects exploring those boundaries now; expectation that new formal tools, type systems, verification, and guardrails will follow, and both exploration and formalization are needed—illustrated by the child who returned at low tide excited to find new hermit crabs.

I was standing at the edge of the ocean last weekend watching the tide pools do their thing. This kid next to me, couldn't have been more than seven, was absolutely furious at the ocean.

"It keeps changing!" she yelled at the Atlantic, as if it might apologize. "Every time I figure out where everything lives, the water comes back and moves it all around!"

Her dad tried to explain about tides, about the moon, about gravitational pull. But she wasn't having it. She wanted the tide pool to pick a state and stick with it. Either be underwater or be exposed. Not this constant back and forth, back and forth.

"But then," her dad said, "nothing new would ever wash in."

"But then," she countered, "I could finally finish counting the hermit crabs."

And I stood there, salt air making my laptop bag feel slightly damp, thinking about how this kid had just described what it is like to be a software engineer. Except we don't have the moon to blame. We did this to ourselves.

[

](https://substackcdn.com/image/fetch/$s_!uFO-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f379140-9bf0-4d61-9d54-94c39a09e5d4_1536x1024.png)

The kid was fighting the tides, but we built ourselves a pendulum. Same back-and-forth, except we're both the clock and the clockmaker.

A Brief Philosophical Detour on the Two Kinds of Programmers

Avdi Grimm gave a talk called The Soul of Software about a decade ago and one particular thing in it has stuck with me: you can tell what type of programmer you were taught by based on which part of object-oriented programming they teach first.

Did your teacher start with inheritance? Class hierarchies, abstract base classes, the whole "a Dog is-a Mammal is-an Animal" taxonomy? Then you were taught by what we call a formalist. Someone from the Dijkstra school of thought, where programs are mathematical proofs that happen to execute. They showed you the blueprints before they showed you the building.

Or did they start with polymorphism? "Look, different things can respond to the same message in their own way!" Objects having conversations, duck typing, the magic of not caring what something is as long as it knows what you're asking? You had an informalist teacher. Someone from the Alan Kay school, where programs are living systems of communicating entities. They let you play with the clay before teaching you about kilns.

This isn’t just a teaching preference, it's two completely different universes of what programming is.

The industry has been switching between these universes, back and forth, like a pendulum, since the beginning of computing. And every time we switch, we act like we've discovered something new.

The formalists see programming as applied mathematics. Proofs you can execute. They sleep better knowing their types check at compile time.

The informalists (or the hermeneutic crowd, if we're being fancy) see programming as writing. As conversation. They sleep better knowing they can change anything at runtime if they need to.

(I was taught inheritance first. It took me years to recover.)

The Smalltalk Séance

Once upon a time, there were these folks at Xerox PARC who talked to their computers. Not like we do now, with our typing and our clicking, but really talked to them. They had this thing called Smalltalk, and it was less a programming language and more a conversation with a very patient friend who happened to be made of electricity.

[

](https://substackcdn.com/image/fetch/$s_!Fqbo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae133c3-f771-4212-9471-837e7f15e274_1950x1272.png)

They invented everything, basically. The mouse (someone told me it was originally supposed to be called the turtle. I don’t think that’s right… but it would have fit really nice with the ocean theme of this post… ). Windows you could move around like pieces of paper on a desk. Menus that dropped down like theater curtains.

But the real invention was the philosophy. Alan Kay wasn't trying to build better programs. He was trying to build better programmers. People who could think in systems, in conversations, in living breathing code that could change itself while running.

Meanwhile, in the Netherlands, Edsger Dijkstra was having nightmares about this exact thing.

The Dijkstra Doctrine

Dijkstra looked at programming and saw chaos. Not the good kind of chaos, where things emerge and evolve. The bad kind, where nothing works and nobody knows why.

"Programming," he said, probably while wearing a very serious expression, "is one of the most difficult branches of applied mathematics."

Dijkstra wanted proofs. He wanted to know, to prove, that a program would work before it ran. He wanted structured programming, where goto statements were considered harmful and every function had one entrance and one exit, like a very orderly party.

And you know what? He wasn't wrong.

When your code controls nuclear reactors, or airplanes, or insulin pumps, you don't want it to be having an exploratory conversation with itself. You want it to be a proof. A proof that happens to execute, but a proof nonetheless.

The Great Translation

Now here's where it gets interesting. C++ and Java didn't just borrow from Smalltalk. They tried to translate Alan Kay's informal, living system into Dijkstra's formal, provable world.

Smalltalk said: "Objects send messages to each other."
C++ heard: "Objects have methods you can call."

Smalltalk said: "Everything happens at runtime."
Java heard: "Some things can happen at runtime, but let's check everything we can at compile time."

Smalltalk said: "The system is alive and you can change it while it runs."
C++ and Java heard: "...what? No. Absolutely not. Are you insane?"

They took the shapes of Smalltalk's ideas but filled them with concrete. Objects became structs with function pointers. Messages became method calls. The living system became a compiled binary.

[

](https://substackcdn.com/image/fetch/$s_!BjH9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c08055-74fd-4d15-a9f2-52467b6cdf46_2181x1404.png)

[

](https://substackcdn.com/image/fetch/$s_!aLQx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f04bd12-7a6c-4655-bf28-424a083e239e_2295x1404.png)

It worked. It was reliable. You could build big systems with teams of people and the compiler would catch your mistakes. But something was lost. The conversation became a monologue. The living system became a corpse that somehow still moved.

The Web's Rebellion

Fast forward. It's the late 90s. Java is trying to eat the web. "Applets!" it shouts. "Enterprise beans!" it insists. Everything must be an object, everything must be typed, everything must be correct.

But then JavaScript happened.

And by "happened" I mean "was created in 10 days by someone who understood both Scheme and Self but had to make it look like Java for marketing reasons."

[

](https://substackcdn.com/image/fetch/$s_!YavX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555c9b7f-7a16-4adf-9295-3e45d41d9933_1719x1137.png)

And from this beautiful combination came everything. Every web app you use. Every framework you love or hate. All built on a language that Dijkstra would have considered a war crime.

Then Ruby joined the party, taking Smalltalk's philosophy and saying "what if we made it even MORE flexible?"

[

](https://substackcdn.com/image/fetch/$s_!nf0l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c120a92-6303-4e91-aa1b-6ef1903661e4_1662x1272.png)

Ruby on Rails said "what if making a web app was actually fun?" and suddenly everyone could build Twitter (the original one that was always falling over, but in a charming way, with an adorable whale).

The Alan Kay disciples were winning. Systems were conversations again. Code was alive, mutable, dangerous, and fun.

The Inevitable Tidying

But then (you knew there was a "but then"), the pendulum started its return journey.

The Rails apps that changed the world started creaking under their own weight. The JavaScript that let you prototype anything in an afternoon also let you create bugs that make you say “Wat?”.

Enter TypeScript, stage left, wearing business casual and a badge that says "I'm JavaScript but I was made by Microsoft."

[

](https://substackcdn.com/image/fetch/$s_!y7PQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44545cd6-b325-45fb-835d-0654705e3a75_2295x1272.png)

Enter Go, designed by people who looked at all the chaos and said "what if we just... didn't allow most of that?"

[

](https://substackcdn.com/image/fetch/$s_!Hedr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55884c7a-673b-4950-955b-57a214ca34b3_2181x1272.png)

Enter Rust, which holds your hand so tightly while you program that you can't possibly hurt yourself (or anyone else).

[

](https://substackcdn.com/image/fetch/$s_!MsND!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f4fbcef-720d-48ef-bd82-8281af419dd9_2325x1539.png)

(I'm only slightly exaggerating.)

Here We Go Again

But here we are again with AI, friends. We’re in the wildest informal moment yet.

People are doing things that would have gotten you laughed out of a code review five years ago. Vibe coding, where you don’t even worry about the code or the programming language and just let Claude figure out the details.

People are putting Claude Code directly on production servers. Not with guardrails or formal specifications. "Fix this bug," they say, and walk away. The code that results might work, might not. Who knows?

Last week, in The System Inside The System, I shared that I’ve kicked off some projects (vsm and airb) that lean all the way into this. Think about it, self-modifying Ruby agents that can rewrite their own capabilities while running. Systems that contain systems that contain systems. Code that writes code that writes code. It's either the future or a cautionary tale, and we won't know which until someone's production system becomes sentient.

We're back in Alan Kay territory, but turned up to eleven. These systems aren’t just alive and mutable, they’re writing themselves. Having conversations with themselves. Sometimes arguing with themselves in PR comments.

This is the informal approach at its most extreme. No proofs, just vibes. No types, just hope. No formal specifications, just "hey Claude, you know what would be cool?"

Someone asked me the other day, "Is it safe to let an AI modify its own code?"

And I said, "Define safe."

And they said, "You know, safe."

And I said, "No, I really don't."

Because safe is what we call things after we've formalized them. Before that, they're just experiments that haven't failed yet.

In a five years, maybe six, we'll start building formal systems around the AI use cases the informalists discovered. We'll develop new languages that have guard rails specifically designed for AI chaos. We'll create type systems that can type-check vibes. We'll invent testing frameworks for code that writes itself. Proof systems for agent behavior. Formal verification for self-modifying code. Strict sandboxes with fine-grained authority. The Dijkstra disciples will arrive, and they'll make it safe.

The pendulum will swing back.

And then, inevitably, it will swing forward again.

Because that's what pendulums do.

The Pattern That Keeps Repeating

Look at any platform shift and you'll see it:

Desktop computing: Smalltalk wizards doing impossible things → C++/Java bureaucrats making it reliable (but losing the magic)

Web 1.0: Perl scripts held together with CGI and prayer → Java EE trying to enterprisify everything

Web 2.0: Ruby/JavaScript cowboys building and shipping in the same breath → TypeScript/Go/Rust bringing adult supervision

And now, AI: "What if we let the machine write itself?" → [PENDING: Whatever we'll invent in 3-5 years to make this safe]

The informalists, they explore the possible. They say "what if?" and "why not?" and occasionally "oops." They build things that shouldn't work but do.

The formalists, they make the possible reliable. They say "prove it" and "define it" and "what about edge cases?"

We need both. Not at the same time (that would be chaos (the bad kind)). But in sequence, like breathing. In like exploration, out like formalization. In like play, out like proof.

The Real Secret

You want to know the real secret? The thing that nobody admits in blog posts (except, I guess, this one)?

We need both types of people. We need the ones who see a cliff and think "I wonder what's at the bottom?" And we need the ones who see the same cliff and think "we should probably build a bridge."

The informalists and the formalists, they're not having different conversations. They're having the same conversation at different times.

The informalists are asking: "What's possible?"
The formalists are asking: "What's sustainable?"

Both questions matter. Neither is more important than the other.

(Okay, sometimes one is more important than the other, but only temporarily, and sometimes it depends on whether your ecommerce startup keeps crashing on black friday.)

[

](https://substackcdn.com/image/fetch/$s_!0Vjt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27d0d6c4-5bb9-4c29-a404-b591b20d2668_1536x1024.png)

And somewhere, in that eternal swing, we occasionally build something that actually matters. Something that changes how people think or work or live. Something that takes us one step closer to the computer revolution that hadn’t happened yet in 1997 but still hasn’t happened in 2025.

The girl never did finish counting her hermit crabs. The tide came in while she was still yelling at it. But I saw her there the next day, at low tide, starting her count all over again. This time she wasn’t angry. She was excited.

“They’re all in different places!” she told me. “New ones washed in!”

And that, my friends, is exactly my point.

Share

Yes, I know you might think my self-modifying Ruby agent framework is part of the problem. But it also might be the solution. Depends which side of the pendulum we're on when you read this.

Read the whole story

bogorad

6 hours ago

reply

Barcelona, Catalonia, Spain

Google did more than just switch to TSMC for Tensor G5
Sunday August 24^th, 2025 at 6:08 AM

Comments on: Google did more than just switch to TSMC for Tensor G5 — This is what’s new

Who/What/When/Where/Why: Google’s Pixel 10 series ships with the Tensor G5 SoC to deliver major on-device AI, performance, and camera improvements at the Pixel 10 launch.
Process and CPU gains: Tensor G5 is manufactured by TSMC on a 3nm node and delivers a CPU that is 34% faster on average with improved responsiveness for browsing, apps, OS rendering, and AI workloads.
Core layout and system I/O: Chip configuration is one large core, five mid-performance cores, and two efficiency cores, plus improved thermal controls, LPDDR5X memory, and UFS 4.0 storage.
TPU and Gemini Nano: TPU is up to 60% more powerful and runs DeepMind’s Gemini Nano, yielding about 2.6x speed and 2x efficiency improvements over Tensor G4 and supporting a 32K token context window (vs 12K previously).
Model architecture enhancements: G5 uses Matformer and Per Layer Embedding to run models more efficiently on-device and improve response quality under mobile RAM constraints, developed in collaboration with DeepMind.
On-device AI feature set: Tensor G5 powers over 20 on-device features at launch, including Magic Cue, Call Notes with actions, the new Journal app, Scam Detection, and Gboard Smart Edit.
Voice Translate and voice preservation: Voice Translate combines a generative model with a classical audio ML model for real-time translation; one-shot voice preservation reconstructs speaker characteristics from seconds of audio with no voice enrollment or stored audio.
Camera and ISP advances: Upgraded ISP improves low-light video, performs granular scene segmentation, enables default 10-bit recording at 1080p and 4K30, motion deblur, improved Real Tone, and powers camera features like Add Me, Auto Best Take, and a nearly 1‑billion‑parameter 100x Pro Res Zoom diffusion model in-app (with a separate 3‑billion‑parameter multilingual speech model for real-time use).

The Pixel 10 series is powered by the Tensor G5, Google’s biggest upgrade to its custom silicon.

Google touts deeper customization, starting with how it’s manufactured by TSMC on the latest 3nm process node. This results in a CPU that is 34% faster on average, with “significant gains” in single and multi-threaded performance. Google says to expect improvements in responsiveness when browsing the web, launching apps, OS rendering, AI experiences, and other workloads.

There’s one big performance core, five mid-performance cores, and two efficiency cores. Google has also upgraded hardware and software thermal controls to let the chip operate at higher frequencies without throttling. Tensor G5 has a high-speed memory interface thanks to LPDDR5X (higher memory bandwidth) and UFS 4.0 (faster flash storage).

The TPU (Tensor Processing Unit) is up to 60% more powerful, while Tensor G5 runs the “newest” Gemini Nano model from DeepMind. This translates to Gemini Nano running “2.6x faster and 2x more efficiently for use cases like Pixel Screenshots and Recorder” compared to Tensor G4. There’s also a 32K token context window on G5 (which can be a month’s worth of emails or a hundred screenshots) compared to 12K last year.

The Tensor team touts collaboration with DeepMind on improving the quality of on-device models. G5 makes use of the Matformer Model Architecture to more efficiently run models, while Per Layer Embedding improves model response quality given the constraints of RAM on mobile devices.

Tensor G5 is responsible for over 20 on-device AI features at launch, like Magic Cue, Call Notes with actions, the new Journal app, Scam Detection, and Gboard Smart Edit. In the case of Voice Translate, the translation of what the speaker is saying involves a generative model running in parallel with a classical audio ML model. One-shot voice preservation means Google is able to reconstruct the speaker’s vocal characteristics from just a few seconds of live audio. On the privacy front, there’s no voice enrollment or audio being stored.

There’s also an upgraded custom Image Signal Processor that particularly helps video performance, especially in low-light conditions. The ISP performs more granular scene segmentation to understand regions and objects. That translates to the Pixel 10 recording 10-bit video by default at 1080p and 4K30. There’s also motion deblur and improved Real Tone.

On camera front, Tensor G5 powers Add Me, Auto Best Take, and 100x Pro Res Zoom. The latter is Google’s largest Pixel Camera model at nearly 1 billion parameters. It’s also the first diffusion model running in the app with the TPU helping play a big role, though the camera pipeline is using every part of the chip (CPU, DSP, ISP). For comparison, zoom on the Pixel 8 Pro’s G3 was tens of thousands of parameters, while Pixel 9 Pro’s’s G4 was in the low millions. (Another example of G5’s prowess is a 3-billion parameter multilingual model for real-time speech use cases.)

Add 9to5Google to your Google News feed.

You’re reading 9to5Google — experts who break news about Google and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Google on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel

Read the whole story

bogorad

9 hours ago

reply

Barcelona, Catalonia, Spain

"RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma
Saturday August 23^rd, 2025 at 4:39 AM

Latent.Space

Who/What/When/Where/Why: Latent.Space podcast (Aug 19, 2025) with Jeff Huber, CEO of Chroma, recorded at Chroma’s studio, explaining why “RAG is dead” and promoting context engineering as the practical approach for modern AI retrieval and vector databases.
Core thesis: Context engineering—carefully selecting and structuring what goes into an LLM’s context window—is more important than generic RAG pipelines as models and use cases scale.
Five retrieval tips: Ship retrieval primitives not “RAG”; win first‑stage hybrid recall (~100–300 candidates); always re‑rank before assembly; prefer tight structured contexts to avoid context rot; create a small gold set and wire it into CI.
Ingest/Query/Outer loop: Ingest—domain‑aware chunking, enrich metadata, optional LLM summaries, embeddings; Query—hybrid retrieval, candidate pool, re‑rank, context assembly with dedupe/diversify and token caps; Outer loop—cache/cost guardrails, generative benchmarking, error analysis, memory/compaction.
Research outputs: Chroma’s technical reports on Context Rot (performance degrades with naive long contexts) and Generative Benchmarking (auto‑generate query/chunk gold sets for evaluation and fine‑tuning).
Chroma Cloud design: Focus on developer experience—pip‑installable single‑node UX, Chroma Distributed for serverless cloud with storage/compute separation, usage‑based billing, fast onboarding and cheap index forking/versioning.
Code retrieval & chunking: Regex and lexical search handle most code queries; augment with embeddings where needed; support fast reindex/forking and use chunk rewriting (NL glosses) at ingest to improve semantic retrieval.
Future directions & memory: Anticipate continual retrieval and staying in latent (embedding) space, offline compaction/summarization for memory, and high ROI from small, high‑quality labeled gold sets and iterative engineering.

Share this post

[

Latent.Space

"RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma

](#)

"RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma

What actually matters in vector databases in 2025, why “modern search for AI” is different, and how to ship systems that don’t rot as context grows.

Aug 19, 2025

48

Share this post

[

Latent.Space

"RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma

](#)

[

1

](https://www.latent.space/p/chroma/comments)

2

[

Share

](javascript:void(0))

In December 2023, we first covered The Four Wars of AI and the RAG/Ops War. After tens of millions poured into vector databases, ups and downs in the hype cycle, we finally have Jeff Huber from Chroma joining us today for the new hot take: “RAG” is dead…

[

](https://substackcdn.com/image/fetch/$s_!X9lt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38d676ab-0825-43de-ab56-15b0278f8cec_1042x184.png)

and as context lengths increase, and more and more AI workloads are shifting from simple chatbots to IMPACTful agents, new work from thoughtleaders like Lance Martin and Dex Horthy are making genuine contributions of substance to the previously underrated context box.

[

](https://substackcdn.com/image/fetch/$s_!eS_1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad3146bb-3ed2-498c-864b-8ca880b9a306_4291x1528.png)

https://rlancemartin.github.io/2025/06/23/context_engineering/

Chroma has been driving some of the most interesting research in the new context engineering space, including their Context Rot and Generative Benchmarking reports.

We spent most of our time talking about current state of retrieval, memory, retrieval benchmarking, etc.

The 5 Retrieval Tips

Don’t ship “RAG.” Ship retrieval. Name the primitives (dense, lexical, filters, re‑rank, assembly, eval loop).
Win the first stage with hybrid recall (200–300 candidates is fine—LLMs can read).
Always re‑rank before you assemble context.
Respect context rot: tight, structured contexts beat maximal windows.
Invest one evening in buying some pizza and creating a small gold set; wire it into CI and dashboards.

[Ingest]
├─ Parse + chunk (domain-aware: headings, code blocks, tables)
├─ Enrich: titles, anchors, symbols, metadata
├─ Optional: LLM “chunk summaries” (NL glosses for code/API)
├─ Embeddings (dense) + optionally sparse signals
└─ Write to DB (text, vectors, metadata)

[Query]
├─ First-stage hybrid: vector + lexical/regex + metadata filters
├─ Candidate pool: ~100–300
├─ Re-rank (LLM or cross-encoder) → top ~20–40
└─ Context assembly:
- instructions/system prompt first
- dedupe/merge near-duplicates
- diversify sources
- hard cap on tokens

[Outer loop]
├─ Cache/cost guardrails
├─ Generative benchmarking on small gold sets
├─ Error analysis → re-chunk/retune filters/re-rank prompt
└─ Memory/compaction: summarize interaction traces → retrievable facts

[

](https://substackcdn.com/image/fetch/$s_!0hpO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd47be148-d846-470b-bb69-5023fb2e72d3_1830x750.png)

Our podcast studio is at the Chroma office, so it we were excited to finally have our landlord as a guest! Enjoy!

Show Notes

Timestamps

[00:00:00] Introductions
[00:00:48] Why Build Chroma
[00:02:55] Information Retrieval vs. Search
[00:04:29] Staying Focused in a Competitive AI Market
[00:08:08] Building Chroma Cloud
[00:12:15] Context Engineering and the Problems with RAG
[00:16:11] Context Rot
[00:21:49] Prioritizing Context Quality
[00:27:02] Code Indexing and Retrieval Strategies
[00:32:04] Chunk Rewriting and Query Optimization for Code
[00:34:07] Transformer Architecture Evolution and Retrieval Systems
[00:38:06] Memory as a Benefit of Context Engineering
[00:40:13] Structuring AI Memory and Offline Compaction
[00:45:46] Lessons from Previous Startups and Building with Purpose
[00:47:32] Religion and values in Silicon Valley
[00:50:18] Company culture, design, and brand consistency
[00:52:36] Hiring at Chroma: Designers, Researchers, and Engineers

Transcript

Alessio [00:00:04]: Hey, everyone. Welcome to the Latent Space podcast in the new studio. This is Alessio, partner and CTO at Decibel, and I'm joined by Swyx, founder of SmolAI.

Swyx [00:00:11]: Hey, hey, hey. It's weird to say welcome because obviously, actually, today's guest, Jeff, has welcomed us to Chroma for many months now. Welcome. Thanks for having me. Good to be here. Jeff, you're a founder, CEO of Chroma. I've sort of observed Chroma for a long, long time, especially back in the old office. And you were, you originally sort of got your start in the open source vector database, right? Like you sort of, you're the open source vector database of choice of a lot of different projects, particularly with even, even projects like the Voyager paper, you guys were used in that. I don't even know like the full list, but how do you introduce Chroma today?

Why Build Chroma

Jeff [00:00:48]: It's a good question. I mean, naturally, you always want to kind of take your messaging and make it fit your audience. Yeah. But I think the reason that Chroma got started. Is because we had worked for many years in applied machine learning and we'd seen how demos, demos were easy to build, but building a production reliable system was incredibly challenging and that the gap between demo and production didn't really feel like engineering. It felt a lot more like alchemy. There's some good, like XKCD memes about this guy standing on top of a giant steaming pile of garbage and the other character asks, this is your data system. And he's like, yes. And he's like, how do you know if it, how do you know if it's good or how do you make it better? Oh, you just like stir the pot and then like, see if it gets any better. That just seemed intrinsically wrong. And this is back in like 2021, 2022, that like we were having these conversations. And so that coupled with like a thesis that like Latent Space was a very important tool. That is a plug. Yes, that is a plug. We need to ring the bell. Yeah, exactly. The gong. That Latent Space, both the podcast, but also the technology was a very underrated tool and a very like important tool for interpretability. It's fundamentally how models see their own. Data, we as humans can kind of, you know, have that shared space to understand what's going on. That's where we got started. And so I think that's also where we continue to want to go. Like, what do we want to do? We want to help developers build production applications with AI and what would make the process of going from demo to production feel more like engineering and less like alchemy. Doing a database is like not a side quest. It is a part of the main quest. What we realized along the way was search was really a key workload to how like AI applications were going to get built. It's not the only workload, but it's like definitely a really important. Workload and that you don't earn the right to do more things until you've done one thing at a world class level that requires maniacal and, you know, kind of, uh, maniacal focus. Um, and so that's really what we've been doing for the last few years. That was a long kind of rambling introduction, but like maybe to sort of land the plane, you know, if you ask people, you know, what does Chrome do today? We build a retrieval engine for AI applications. We're working on modern search infrastructure for AI, um, some version of that.

Information Retrieval vs. Search

Swyx [00:02:55]: I'll do a double click on this. Is information retrieval and. And search the same thing, or are they slowly different in your mind? I just wanted to clarify our terminology. Yeah.

Jeff [00:03:04]: I think that, you know, that modern search infrastructure for AI, we're going to maybe unpack that for a couple of seconds. So modern is in contrast to traditional. And mostly what that means is like modern distributed systems. So there's a bunch of primitives and building great distributed systems that have come on to the scene in the last five, 10 years that obviously are not in technology that is older than that. By definition, separation of read and write separation of storage, compute, Chrome is written in rust. It's fully multi-tenant, um, we have, we use object storage as a key assistance tier and like data layer for Chroma, uh, distributed in Chroma cloud as well. So that's the modern piece. And then the four AI piece actually, I think is it matters in four kind of different ways. Like four AI means four different things. Like it means number one, the tools and technology that you use for search are different than in classic search systems. Number two, the workload is different than classic search systems. Number three, the developer is different than classic. Search systems and number four, the people who's the person who's consuming those search results is also different than in classic search systems. Think about like classic search systems. Like you as the human, we're doing the last mile of search, you know, you were doing click, click, exactly. You're like, oh, like which of these are relevant, open a new tab, summarize, blah, blah, blah, blah. You, the human we're doing that. And now it's a language model. Humans can only digest 10 blue links. Language models can digest orders of magnitude more. All of these things matter. And I think influence like how a system is designed. Yeah. Yeah. It's sort of like made for.

Staying Focused in a Competitive AI Market

Alessio [00:04:29]: Back in 2023, I think the VectorDB category was kind of one of the hottest ones. And you had Pinecon raise a hundred million. You had all these different WVAs, all these companies. Yeah. How did you stay focused on like what matter to you rather than just try to raise a lot of money and make a big splash? And it took you a while to release ChromaCloud too, which rather than just getting something out that maybe broke once you got to production, you kind of took your time. Yeah. Can you maybe give people advice on in the AI space, how to be patient? How do you have your own vision as a founder and how to have your own vision that you follow versus kind of like following the noise around you?

Jeff [00:05:03]: There are different ways to build a startup. And so, you know, there's schools of thought here. So one school of thought certainly is like the find signal and kind of follow the gradient descent of what people want sort of lean startup style. My critique of that would be that if you follow that methodology, you will probably end up building a gating app for middle schoolers because that just seems to be like the lowest base take of what humans want to some degree. The slot machine would be the AI equivalent of that versus, you know, the other way to build a startup is to have a very strong view, presumably a contrarian view, or at least a view that seems like a secret. And then to just be maniacally focused on that thing, you know, there are different strokes for different folks, but we've always taken the second approach. And yeah, there was the option of like, okay, Chroma's single node is like doing really well, getting a bunch of traffic. Clearly having a hosted service is the thing people want. Like we could just spend. Uh, we could very quickly get a product in the market, but we felt like no, really what we want Chroma to be known for is our developer experience. Like we want our brand to be, we want Chroma's brand and the craft expressed in our brand to be extremely well known. And we felt like by offering a single node product as a service, like it was not going to meet our bar of like what great developer experience could and should look like. Yeah, we made the decision of like, no, we're going to like build the thing that we think is right, which was really challenging, um, it took a long time and obviously I'm incredibly proud that it exists today and that it's like serving hundreds of thousands of developers and they love it, but it was hard to get there.

Alessio [00:06:38]: When you're building the team, how do you message that? If I go back maybe like a year and a half ago, you know, I could join Chroma, I could join all these different companies. How do you keep the vision clear to people when on the outside you have, oh, I'll just use PG vector or like, you know, whatever else the thing of the day is. Um, do you feel like that helps you bring people that are more aligned with the vision versus more of the missionary type on just joining this company before it's hot and maybe any learning that you have from recruiting early on?

Jeff [00:07:07]: The upstream version of Conway's law, like you ship your org chart is you ship your culture because I think your org chart is downstream of your company's culture. We've always placed an extremely high premium on that, on people that we actually have here on the team. Um, I think that the slope of our future growth is entirely. Dependent on the people that are here in this office and, you know, that could mean going back to zero. That could mean, you know, linear growth. That could mean all kinds of versions of like hyperlinear growth, exponential growth, hockey stick growth. And so, yeah, we've just really decided to hire very slowly and be really picky. And I don't know, I mean, you know, the future will determine whether or not that was the right decision, but I think having worked on a few startups before, like that was something that I really cared about was like, I just want to work with people that I love working with. And. Like want to be shoulder to shoulder with in the trenches. And I think in independently execute on the level of like craft and quality that like we owe developers. And so that was how we chose to do it.

Building Chroma Cloud

Swyx [00:08:08]: We'll talk about standard condition on the other fun stuff towards the end, but we'll, we'll focus on Chroma. I always want to put like some headline numbers up front. So I'm just trying to do a better job of like giving people the brain dump on what they should know about Chroma. 5 million monthly downloads is what I have on Pi Pi and 21,000 GitHub stars. Anything else people should know, like, that's like the typical sales call, like headline stuff like that, you know?

Jeff [00:08:33]: Yeah. Um, yeah, 20,000 GitHub stars, 5 million plus monthly downloads. Um, I've looked at the number recently, I think it's like over 60 or 70 million all time downloads now for many years running Chrome has been the number one used project broadly, but also within communities like playing chain and a lot of index. Okay, cool. Fair enough. Yeah.

Swyx [00:08:51]: I think like when you say single node Chroma, like I think you're describing the quality. Yeah. Like the core difference between like what Chroma cloud has been, and I think we're releasing this in, in line with like your GA and Chroma cloud. Uh, yes. So like, what should people know about Chroma cloud and like how you've developed this experience from, from the start? Like you, you, you mentioned separation of storage and compute, like what does that. Yeah.

Jeff [00:09:13]: A hundred percent. Chroma is known for its developer experience. I don't know that we were the first to do this. I think we were with Chroma, you just pip install ChromaDB and then you can use it. It's just like in memory. Like, I think you can persist.

Swyx [00:09:25]: It could be the first.

Jeff [00:09:26]: Database to ever be pip installable. Um,

Swyx [00:09:28]: Any SQLite wrapper is pip installable technically, you know? No, SQLite was not like pip installable even to this day. I don't think. Well, you would probably have a, a deeper dive and knowledge of this site. I'm just speculating myself. Yeah.

Jeff [00:09:40]: So that, that led to like a very seamless onboarding experience for new users. Cause you could just run a command and then you could use it. We did all the work to make sure that like, regardless of the deployment target or architecture that you're running it on, like it would just work. In the early days, we had people do really great stuff, like run it on Arduinos and PowerPC architectures and like really esoteric stuff, but like we would like go the extra mile to like, make sure that it worked everywhere and just, it just always worked. So that was Chroma single node. So going back to like the developer experience that we wanted to have in a cloud product, like we felt that in the same way that you could run pip install ChromaDB and be up and running in five seconds and like not have to think about it, you can learn a bunch of abstractions. You don't have to like spend a bunch of time learning, which this is a really complicated API. That same story had to be true. For the cloud. And so what that meant is like having a version of the product where you'd have to be forced to think about like how many nodes you want or how to size those nodes or how your sharding strategy should be, or your backup strategy or your data tiering strategy, or I could go on, like, it just wasn't, wasn't good enough. It needed to be like zero config, zero knobs to tune. It should just be always fast, always very cost-effective and always fresh without you having to do or think about anything. Right. Regardless of how your traffic goes up and down and how your data scale goes up and down. That was sort of the motivating criteria. It also like usage-based billing was really important because that just is like so fair. We only charge you for the minimal slice of compute that you use and like nothing more, which not all serverless databases can claim, but it is true inside of Chroma that we like truly only charge you for the narrow slice of what you use. And so like that was the criteria that we entered kind of the design criteria process.

Swyx [00:11:19]: Which is, you know, de facto, you're also building a serverless compute platform.

Jeff [00:11:23]: Yeah, you have to. No, exactly. That motivated the design of Chroma Distributed. Chroma Distributed is also a part of the same monorepo that's open source Apache 2 and then the control and data plane are both fully open source Apache 2 and then Chroma Cloud uses Chroma Distributed to run a service and that service you can sign up, create a database and load in data in under 30 seconds and this is sort of a time of filming people get like five bucks of free credits, which is actually enough to load in like 100,000 documents and query it 100,000 times, which obviously takes a lot of time. I think for a lot of use cases actually might mean they use it for free for years, which is fine. And to get there, we had to do kind of all the hard work. Yeah.

Swyx [00:12:03]: I think every blog should basically have semantic indexing. So like, you know, you host your personal blog on Chroma, you know, like we're not.

Jeff [00:12:10]: Yeah, I mean, you know, the mission of organizing the world's information remains unsolved.

Context Engineering and the Problems with RAG

Swyx [00:12:15]: Yeah.

Context Engineering and the Problems with RAG

Alessio [00:12:15]: You have one of your usual cryptic tweets. I need text. You tweeted context engineering a couple months ago. What was it? April. I think everybody.

Jeff [00:12:24]: I think something that's incredibly important when a new market is emerging is abstractions and the primitives that you use to reason about that thing. And AI, I think, in part of its hype, has also had a lot of primitives and abstractions that have gotten thrown around and have led to a lot of developers not actually being able to think critically about what is this thing, how do I put it together, what problems can I solve, what matters, where should I spend my time? For example, the term rag. We never use the term rag. I hate the term rag.

Swyx [00:13:08]: Yeah, I killed the rag track partially because of your influence.

Jeff [00:13:10]: Thank you. Thank you. A is just retrieval, first of all. Like, retrieval, I've been a generation. Are three concepts put together into one thing? Like, that's just really confusing. And, of course, rag got known now as these branded as, like, you know, oh, you're just using single dense vector search, and that's what rag is. It's also dumb. I think one of the reasons I was really excited about the term, I mean, obviously, AI engineering, which you did a ton of work for, like, context engineering is in some ways a subset of AI engineering. Like, what is it? It's a high-status job. Context engineering is the job of figuring out what should be in the context window. Any given LLM generation step. And there's both an inner loop, which is setting up the, you know, what should be in the context window this time. And there's the outer loop, which is how do you get better over time at filling the context window with only the relevant information. And we recently released a technical report about context rot, which goes sort of in detail, in depth about how the performance of LLMs is not invariant to how many tokens you use. As you use more and more tokens, the model can pay attention to less and then also can reason sort of less effectively. I think this really motivates the problem. You know, context rot implies the need for context engineering. And I guess, like, why I'm really excited about the meme and, you know, I got maybe both lucky to some degree that, you know, called it back in April, this is going to be a big meme, is that it elevates the job to, it clearly describes the job and it elevates the status of the job. This is what, frankly, most AI startups, any AI startup that you know of that you think of today that's doing very well, like, what are they fundamentally good at? What is the one thing that they're good at? It is context engineering.

Swyx [00:14:45]: Particularly, I would feel like a lot of pieces I've read, a lot of it focuses on agents versus non-agent stuff. Like, the context engineering is more relevant for agents. Do you make that distinction at all? Or you're just looking at context engineering generally?

Jeff [00:15:00]: No. I mean, there's interesting agent implications of, like, you know, agent learning, you know, can agents kind of learn from their interactions, which maybe are less relevant and like static sort of knowledge-based corpuses, chat your documents, obviously. Then again, like, you know, I think you could make the argument that even, like, chat your document use cases, like, should get better with more interactions. I don't draw a distinction between agent and non-agent. I don't actually know what agent means still, but, again, impermitted subtractions, words, they matter. I don't know. Like, what does agent mean? I don't know. Well, there's many definitions out there. Exactly. I've taken a stab. Most terms that can mean anything are just a vehicle for people's hopes and fears. Yeah. I think, you know, agent is the same thing. For sure.

Swyx [00:15:42]: Well, maybe we'll try to be more. More concise or precise about context engineering so that it doesn't, it actually means something and, you know, people can actually use it to do stuff. One thing I definitely will call out for context engineering or context rot in general is I think that there's been a lot of marketing around needle in a haystack, where every frontier model now comes out with, like, completely green, perfect charts of full utilization across, you know, 1 million tokens. I'm wondering what you guys' takes are on that kind of marketing. Yeah. Yeah.

Context Rot

Jeff [00:16:11]: So maybe back up a little bit. But the way that we came to work on this research was we were looking actually at agent learning. So we were very curious, like, could you give agents access to, like, prior successes or prior failures? And if you did, would that help boost agent performance? So we were specifically looking at a couple different data sets, SweetBench inclusive, and we started seeing interesting patterns where, like, on sort of multi-turn agent interactions where you're giving it the whole conversation window, like, the number of tokens explodes extremely quickly and instructions that were clearly in there, like, were being ignored and were not being announced. And we're like, oh, that clearly is a problem. We've now felt the pain. It was sort of a meme amongst people in the know that, like, this was true. And, like, I think also, you know, some of the research community's reaction to the context rot technical report is like, yeah, we know. And, you know, that's fine. Nobody else knew. And, like, it's kind of nice if, like, you can actually teach builders what is possible today versus what is not possible today. I don't blame the labs. I mean, building models is so insanely competitive. Everybody invariably is, like, picking the benchmarks that they want to do the best on. They're training around those. Those are also the ones that, you know, find their way into their marketing. You know, most people are not motivated to come out and say, here are all the ways that our thing is great, and here are all the ways that our thing is not great. You know, I don't know. I have some sympathy for, you know, why this was not reported on. But, yeah, I mean, there was this bit of, like, this sort of implication where, like, oh, look, our model is perfect on this task, needle in a haystack. Therefore, the context window you can use for whatever you want. There was an implication there. And, well, I hope that that is true someday. That is not the case. Yeah.

Swyx [00:17:43]: We'll send people, at least on the YouTube video, we'll put this chart, which is kind of your figure one of the context route report. It seems like Sonnet 4 is the best in terms of area under curve, is how I think about it. Then Quinn, wow. And then GPC 4.1 and Gemini Flash are degraded a lot quicker in terms of the context length. Yeah.

Jeff [00:18:03]: I don't have much commentary. That is what we found for this particular task. Again, how that translates to people's actual experience in real world, you know, tasks is entirely different. I mean, there is a certain amount of love that developers have for Claude and, like, maybe those two things are correlated. Yeah. I think it shows here if this is true, that's a big explanation for why. You follow my instructions, you know, like, here's a clear baseline, you know, thing people want.

Swyx [00:18:27]: I don't think it's super answered here, but I have a theory also that reasoning models are better at context utilization because they can loop back. Normal autoregressive models, they just kind of go left to right. But reasoning models, in theory, they can loop back and look for things that they need. They need connections for that they may not have paid attention to in the initial pass. There's a paper today that showed, I think, maybe the opposite. Really?

Jeff [00:18:49]: I'll send it to you later. Yeah.

Swyx [00:18:50]: That'd be fascinating to figure out.

Alessio [00:18:52]: There's papers every day. I thought the best thing was that you did not try to sell something. You're just like, hey, this thing is broken. Kind of sucks. How do you think about problems that you want to solve versus research that you do to highlight some of the problems and then hoping that other people will participate? Like, does everything that you talk about? Is it on the Chroma roadmap, basically? Or are you just advising people, hey, this is bad, work around it, but don't ask us to fix it?

Jeff [00:19:20]: Going back to what I said a moment ago, like, Chroma's broad mandate is to make the process of building applications more like engineering and less like alchemy. And so, you know, it's a pretty broad tent, but we're a small team and we can only focus on so many things. We've chosen to focus very much on one thing for now. And so I don't think that I don't have the hubris to think that we can ourselves solve. This stuff conclusively for a very dynamic and large emerging industry, I think it does take a community, it does take like a rising tide of people all working together. We intentionally wanted to, like, make very clear that, like, we do not have any, like, commercial motivations in this research. You know, we do not posit any solutions. We don't tell people to use Chroma. It's just here's the here's the problem.

Swyx [00:20:02]: It's implied.

Jeff [00:20:05]: Listen, we weren't sad that that was maybe and maybe it may be a positive vacation, you know, but it's still there's no reasons around that. But, you know, speed and cost regardless, I think. But there's just a lot of work to do. And I think that, like, it's interesting where, like, the labs don't really care and they're not motivated to care increasingly as the market to be to be a good LLM provider. The main market seems to be consumer. You're just not that motivated to, like, help developers as a secondary concern, as a secondary concern. And so you're not that motivated really to do the legwork to, like, help developers learn how to build stuff. Yeah. And then, like, if you're a SaaS company or you're a consumer company, you're building with AI, you're an AI native company. Like, this is your, like, this is your secret sauce. You're not going to market how to do stuff. And so, like, I mean, it's just like there's a natural empty space, which is people that are actually have the motivations to, like, help show the way for how developers can build with AI. Like, they're just there's not a lot of obvious people who are, like, obviously invest in their time and energy in that. But I think that is obviously a good thing for us to do. And so that's kind of how I thought about it.

Swyx [00:21:02]: Just a pushback on the consumer thing. Like, you say labs and, you know, don't you think, like, opening eye, building memory into ChatGPT and making it available to literally everyone? I mean, it's probably too much in your face, I would argue, but, like, they would really care to make the memory utilization good. I think context utilization, context engineering is important for them, too, even if they're only building for consumer and don't care about developers.

Jeff [00:21:25]: Yeah. How good is it today is obviously one important question, but we'll skip that one. Like, even if that's the case, are they actually going to publish those findings? No, Exactly. It's alpha, right? Why would you give away your secrets? Yeah. And so I think there's just, like, very few companies that actually are, like, in the business. I think there's a position where, like, they have the incentive and they really care about, like, trying to teach developers how to build useful stuff with AI. And so I think that we have that incentive.

Prioritizing Context Quality

Alessio [00:21:49]: But do you think you could get this to grow to the point of being the next needle in a haystack and then forcing the model's providers to actually be good at it?

Jeff [00:21:57]: There's no path to forcing anybody to do anything. And so we thought about that when we were kind of putting this together. Like, oh, maybe we should, like, sort of formulate this as a formal benchmark that you can make it very easy to, like, we did open source, all the code. So, like, you could. You know, if you're watching this and you're from a large model company, you can do this. You can take your new model that you haven't released yet and you can run, you know, these numbers on it. And, you know, I would rather have a model that has a 60,000 context, token context window that is able to perfectly pay attention to and perfectly reason over those 60,000 tokens than a model that's, like, 5 million tokens. Like, just as a developer, the former is, like, so much more valuable to me than the latter. I certainly hope that model providers do, like, pick this up as something that they care about and that they train around and that they, you know, evaluate their progress on and they communicate to developers. As well. That would be great.

Alessio [00:22:42]: Do you think this will get a better lesson as well? How do you decide which of the... Because, you know, you're busy saying, yeah, the models will not learn this. It's going to be a trick on top of it that you won't get access to. I'm not saying that. Well, but when you're saying that they will not publish how to do it, well, it means that the model API will not be able to do it, but they will have something in ChatGPT that will be able to do it. I see. Yeah.

Jeff [00:23:04]: It's very risky to bet what's going to be a better lesson versus what is not. I don't think I'll hazard a guess. Hopefully not AI engineers. Yeah. Hopefully not all of humanity. I don't know. You know, yeah.

Swyx [00:23:14]: To me, also an interesting discipline developing just around context engineering. Lance Martin from Langchain did a really nice blog post of like all the different separations. And then you in New York, you had, you hosted your first meetup. We're going to do one here in San Francisco as well. But I'm just kind of curious, like, what are you seeing in the fields? Like who's doing interesting work? What are the top debates? That kind of stuff.

Jeff [00:23:37]: I think this is still... I mean, a lot of people are doing nothing. A lot of people are just still eating everything into the context window. That is very popular. And you know, they're using context caching and that certainly helps, but like there are costs and speed, but like isn't helping the context problem at all. And so, yeah, I don't, I don't know that there's lots of best practices in place yet. I mean, I'll highlight a few. So the problem fundamentally is quite simple. It's, you know, you have N number of sort of candidate chunks and you have Y spots. Available and you have to do the process to curate and cull down from 10,000 or a hundred thousand or a million candidate chunks, which 20 matter right now for this exact step. That optimization problem is not a new problem to many applications and industries, sort of a classic, um, a classic problem. And of course, like what tools people use to solve that problem. Again, I think it's still very early. Um, it's hard to say, but a few patterns that I've seen. So one pattern is to use what a lot of people call first stage retrieval to do a big cull down. So that's to be using signals like vector search, like full text search, like metadata filtering, metadata search, and others to go from, let's say 10,000 down to 300. Like we were saying a moment ago, like you don't have to give an LLM 10 blue links. You can brute force a lot more. And so using an LLM as a re-ranker and brute forcing from 300 down to 30, I've seen now emerging. A lot, like a lot of people are doing this and it actually is like way more cost effective than I think a lot of people realize I've heard of people that are running models themselves that are getting like a penny per million input tokens and like the output token cost is basically zero because it's like, uh, you know, the simplest. These are dedicated re-ranker models, right? Not no LLMs. No, these are LLMs. Okay. They're just using LLMs as re-rankers. Okay. And of course there are also dedicated re-ranker models that by definition are going to be so like cheaper because they're much smaller and faster. Cause they're much smaller. But like what I've seen emerge is like application developers who already know how to prompt are now applying that tool to re-ranking. And I think that like, this is going to be the dominant paradigm. I actually think that like probably purpose built re-rankers will go away. And the same way that like purpose built, they'll still exist, right? Like if you're at, if you're at extreme scale, extreme cost, yes, you'll care to optimize that. And the same way that if you're running with hardware, right? Like you're just going to use a CPU or GPU. Unless you absolutely have to. You don't have to have an ASIC or an FPGA and I think the same thing is true about like re-rankers where like, as LLMs become a hundred, a thousand times faster, a hundred, a thousand times cheaper that like people are just going to use LLMs for re-rankers and that actually like brute forcing information curation is going to become extremely, extremely popular. Now today, the prospect of running 300 parallel LLM calls, even if it's not very expensive, you know, the tail latency on any one of those like 300 LLM calls. And so like, there are good reasons to not do that today in a production application, but those will also go away over time. So those patterns I think I've seen emerge that are, that, that's a, that is a new thing that I think I've only seen start to really become popular in the last few months. And by popular, I mean like popular in like the leading tip of the spear, but I think will become a very, very dominant paradigm. Yeah.

Code Indexing and Retrieval Strategies

Swyx [00:27:02]: We've also covered a little bit on, especially on the code indexing side of the house. So everything we've been talking about applies to all kinds of contexts. I think code is obviously a special kind of context and corpus that you want to index. We've had a couple of episodes, the cloud code guys and the client guys talk about, they don't embed or they don't index your code base. They just give tools and use the tools to code search. And I've often thought about whether or not like this should be the primary context retrieval paradigm where when you build an agent, you effectively call out to another agent with all these sort of recursive re-rankers and summarizers or another agent with tools. Yep. Or do you sort of glom them on to a single agent? I don't know if you have an opinion, obviously, because agent is very ill-defined, but I'll just put it out there.

Jeff [00:27:47]: Okay. Got to pull that apart. So, you know, indexing by definition is a trade-off. Like when you index data, you're trading write time performance for query time performance, you're making it slower to ingest data, but much faster to query data, which obviously scales as data sets get larger. And so like, if you're only grepping very small, you know, 15 file code bases, you probably don't have to index it. And that's okay. If you want to search all of the open source dependencies of that project, you all have done this before in VS code or cursor, right? You've run a search over like the node modules folder. It takes a really long time to run that search. That's a lot of data. Like, so to make that indexed and sort of, again, make that trade off of write time performance or query time performance. Like that's what, that's what indexing is like, just like demystify it. What is this? Right? Like, that's what it is. You know, embeddings are known for semantic similarity today. Embeddings is just a generic concept of like information compression, and there's actually like many tools you can use embeddings for, I think embeddings for code are still extremely early and underrated, but regex is obviously an incredibly valuable tool and, you know, we've actually worked on now inside of Chroma, both single load and distributed, we support regex search natively. So you can do regex search inside of Chroma because we've seen that as like a very powerful tool for code search. It's great. And we build indexes to make regex search go fast at large. So you can do regex search data volumes on the coding use case that you mentioned. Another use case that another feature we added to Chroma is the ability to do forking. So you can take an existing index and you can create a copy of that index in under a hundred milliseconds for pennies. And in so doing, you then can just apply the diff for what file has changed to the new index. So any like corpus of data that's logically changing. So very fast re-indexing is the result. But now you can like have an index for like different each commit. So if you want to search different commits to search different branches or different release tags, like any corpus of data that's like a logically versioned, you now can search all those versions very easily and very cheaply and cost-effectively. And so, yeah, I think that, you know, that's kind of how I sort of think about like regex and indexing and embeddings. I mean, yeah, the needle continues to move here. I think that anybody who claims to have the answer, you just like shouldn't listen to them.

Jeff [00:30:02]: When you said that code embeddings are underrated, what do you think that is? Most people just take generic embedding models that are trained on the internet. And they try to use them for code. And like, it works okay for some use cases, but does it work great for all use cases? I don't know. Another way to think about these different primitives and what they're useful for, fundamentally, we're trying to find signal. Text search works really well. Lexical search, text search works really well. When the person who's writing the query knows the data. If I want to search my Google Drive, I just, for the spreadsheet that has all my investors, I'm just going to type in cap table. Because I know. Spreadsheet in my Google Drive called cap table. Full text search, great. It's perfect. I'm a subject matter expert in my data. Now, if you wanted to find that file and you didn't know that I had a spreadsheet called cap table, you're going to type in the spreadsheet that has the list of all the investors. And, of course, in embedding space, in semantic space, that's going to match. And so, I think, again, these are just like different tools. And it depends on like who's writing the queries. It depends on what expertise they have in the data. Like what blend of those tools is going to be the right fit? My guess is that. Like for code today, it's something like 90% of queries or 85% of queries can be satisfactorily run with Regex. Regex is obviously like the dominant pattern used by Google Code Search, GitHub Code Search. But you maybe can get like 15% or 10% or 5% improvement by also using embeddings. Very sophisticated teams also use embeddings for code as part of their code retrieval code search stack. And, you know, you shouldn't assume they just enjoy spending money on things unnecessarily. They're getting some, they're eking out some benefit there. And, of course, like for companies that want to be like top of their game and want to like, you know, corner their market and want to serve their users the best. This is kind of what it means to build great software with AI. 80% is quite easy. But getting from 80% to 100% is where all the work is. And like, you know, each point of improvement like is a point on the board. And it's a point that like I think users care about. And it's a point that you can use to, yeah, fundamentally just like serve your users better.

Chunk Rewriting and Query Optimization for Code

Alessio [00:32:04]: Do you have any thoughts on the developer experience versus agent experience? Like this. This is another case where, well, we should maybe reformat and rewrite the code in a way that it's easier to embed and then train models there. Where are you on that spectrum? Yeah.

Jeff [00:32:19]: I mean, one tool that I've seen work well for some use cases is instead of just embedding the code, you first have an LLM generate like a natural language description of like what this code is doing. And either you embed like just the natural language description or you embed that and the code or you embed them separately and you put them in like separate, you know. Vector search indexes, chunk rewriting is kind of like the broad category of like what that is. Again, this is like the idea here is like it's related to indexing, which is as much structured information as you can put into your write or your ingestion pipeline, you should. So all of the metadata you can extract, do it at ingestion. All of the chunk rewriting you can do, do it at ingestion. If you really invest in like trying to extract as much signal and kind of pre-bake a bunch of the signals at the ingestion side. I think it makes the downstream query task like much easier. But also, you know, just because we're here, like it's worth saying like people should be creating small golden data sets of what queries they want to work and what chunks should return. And then like they can quantitatively evaluate what matters. Maybe you don't need to do a lot of fancy stuff for your application. It's entirely possible that, again, just using regex or just using vector search, depending on the use case, that's maybe all you need. I guess, again, anybody who's claiming to know the answer. You should, the first thing you should ask is, let me see your data. And then if they don't have any data, then you have your answer already.

Swyx [00:33:47]: I'll give a plug to a talk that you gave at the conference, how to look at your data. Yes. Looking at your data is important. Having, having golden data sets. These are all like good practices that I feel like somebody should put into like a little pamphlet, call it the 10 commandments of AI engineering or something. Okay. You might do that. Yeah. Thou shall look at your data.

Transformer Architecture Evolution and Retrieval Systems

Swyx [00:34:07]: We're about to move on to memory, but like, I want us to sort of leave it there.

Jeff [00:34:09]: I want us to sort of leave space for like, you know, any other threads that you feel like you always want to get on a soapbox about that, yeah, that's a dangerous, that's a dangerous thing to ask.

Swyx [00:34:18]: I have one to, to key off of, because I think, uh, I didn't, I didn't know, I didn't, I didn't know where to insert this in the conversation, but we're kind of skirting near it that I'm trying to explore, which is, you know, uh, I think you had this rant about R, A, and G where the original transformer was sort of like an encoder decoder architecture. Hmm. Then GBT turns, uh, most transformers. Transformers into decoder only, but then we're also encoding with all the, um, um, embedding models as encoder only models. So in some sense, we sort of decoupled the transformer into first, we encode everything with the encoder only model, put it into a vector database like chroma and chroma also does other stuff, but you know, um, then, then we decode with, uh, the LMS. And I just think it's like a very interesting meta learning about the. The overall architecture, like it is stepping out of just the model to models and system. And I'm curious if you have any reflections on that or if you have any modifications to what I just said.

Jeff [00:35:20]: I think there's some intuition there, which is like the way we do things today is very crude and we'll feel very caveman and five or 10 years, you know, why aren't we just, why are we going back to natural language? Why aren't we just like passing the embeddings like directly to the models who are just going to functionally like re put it into latent space. Right. Yeah. They have a very thin embedding layer. Yeah. Yeah. So I think like, there's a few things that I think might be true about retrieval systems in the future. So like number one, they just stay in latent space. They don't go back to natural language. Number two, instead of doing like, this is actually starting out to change, which is really exciting, but like for the longest time we've done one retrieval per generation. Okay. Do you retrieve, and then you stream out and number of tokens, like, why are we not continually retrieving as we need to, um, don't call it, um, but there was a paper or a paper in a, in a, you know, maybe like a. Like a GitHub that came out a few weeks ago, I think it was called, unfortunately rag R one where they like teach, uh, DCR one, you know, kind of give it the tool of how to retrieve. And so like kind of in its internal chain of thought and it's in for tense compute, it's actually like searching.

Swyx [00:36:22]: There's also retrieval, augmented language models.

Jeff [00:36:24]: I think this is an older paper. Yeah. Yeah. There's a bunch of realm and retro and it's kind of long history here. Um, so I think that, you know,

Swyx [00:36:31]: Somehow not that popular.

Jeff [00:36:32]: I don't know why. Somehow not that popular. Well, there are a lot of those have the problem where like either the retriever or the language model has to be. Frozen. And then like the corpus can't change, which most developers don't want to like deal with the developer experience around.

Swyx [00:36:45]: I would say like, we would do it if, if the gains were that high or the labs don't want you to do it. I don't know about. Yeah.

Jeff [00:36:54]: Cause the labs have a huge amount of influence. Lots of huge amount of influence. I think it's also just like, you don't get, you don't get points on the board by doing that. Well, you just like, don't, no one cares. The status games don't, don't reward you for solving their problem. So yeah. So broadly. Continual retrieval, I think will be interesting to see come out of the scene. Number one, number two, staying in embedding space will be very interesting. And then, yeah, there's some interesting stuff also about kind of like GPUs and how you're kind of like paging information into memory on GPUs that I think can be done like much more efficiently. Um, and this is more like five or 10 years in the future that we're kind of thinking about, but yeah, I think, I think when we look back in things, this was like, like hilariously crude, the way we do things today.

Swyx [00:37:34]: Maybe, maybe not, you know, we're solving IMO, uh, challenges with. Just language, you know? Yeah, it's great. I'm still working on the implications of that. Like it's, it's still a huge achievement, but also very different than how I thought we would do things.

Alessio [00:37:47]: You said that memory is the benefit of context engineering. I think there's a, you, you had a random Twitter about stop making memory for ISO complicated. How do you think about memory? And it's what are like maybe the other benefits of context engineering that maybe we were not connecting together?

Memory as a Benefit of Context Engineering

Jeff [00:38:06]: I think memory is a good term. It is very legible to a wide population. Again, this is sort of just continuing the anthropomorphization of LMS. You know, we ourselves understand how we are, we, as humans use memory. We're very good at, well, some of us are very good at using memory to learn how to do tasks. And then those learnings being like flexible to new environments and, you know, the idea of being able to like take an AI, sit down next to an AI and then instruct it for 10 minutes or a few hours. And kind of. Just like tell it what you want it to do. And it does something and you say, Hey, actually do this next time. The same that you would with a human at the end of that 10 minutes at the end of those few hours, the AI is able to do it now and the same level of reliability that a human could do it like is an incredibly attractive and exciting vision. I think that that will happen. And I think that memory again is like the memory is the term that like everybody can understand, like we all understand our moms all understand and, and, and the benefits of memory are also very appealing. And very attractive, but what is memory under the hood? It's still just context engineering, I think, which is the domain of how do you put the right information into the context window? And so, yeah, I think of memory as the benefit context engineering is the tool that gives you that benefit. And there may be stuff as well. I mean, maybe there's some version of memory where it's like, oh, you're actually like using RL to improve the model through data scene. And so I'm not suggesting that like only changing the context is the only tool which, you know, gives you. Great performance on tasks, but I think it's a very important part.

Alessio [00:39:43]: Do you see a big difference between synthesizing the memory, which is like, based on this conversation, what is the implicit preference? Yeah, that's one side. And then there's the other side, which is based on this prompt, what are the memories that I should put in?

Jeff [00:39:58]: I think they will be all fed by the same data. So the same feedback signals that tell you how to retrieve better will also tell you what to remember better. So I don't think they're actually different problems. I think they're the same problem.

Structuring AI Memory and Offline Compaction

Swyx [00:40:13]: To me, the thing I'm wrestling with a little more is just what are the structures of memory? That makes sense. So there's like obviously all these analogies with like long term memory, short term memory, let us try to coin something around sleep. I do think that there definitely should be some sort of batch collection cycle, maybe sort of garbage collection cycle where it's like where the LLM is sleeping. But I don't know what. I don't what makes sense, like we're making all these analogies based on what we think, how we think humans work, but maybe AI doesn't work the same way. Yeah. I'm curious about anything that you've seen that's working.

Jeff [00:40:48]: Yeah, I always, again, you know, as a through line of this conversation, I always get a little bit nervous when we start creating new concepts and new acronyms for things. And then all of a sudden there's, you know, info charts that are like, here are the 10 types of memory. And you're like, why? These are actually, if you squint, they're all the same thing. Like, do they have to be different? You know, like. You have to blow the people's minds. No, I don't think you do. I don't know. You got to resist the slot machine, the slot and the slot machine.

Jeff [00:41:16]: Compaction has always been a useful concept in. Even in databases. In databases on your computer. We all remember running Defrag on our Windows machines in 1998. And, you know, so, yeah, again. Some of us not old enough to do that. I am. Not at this table. And yeah. So obviously, obviously. Offline processing is helpful, and I think that is also helpful in this case. And as you're talking about before, like, what is the goal of indexing? The goal of indexing is to, like, trade write time performance for query time performance. Compaction is another tool in the toolbox of, like, sort of write time performance. You're re-indexing data.

Swyx [00:41:52]: It's not indexing, but actually it is indexing.

Jeff [00:41:55]: It's sort of re-indexing. Yeah. You're taking data. You're like, oh, maybe those two data points should be merged. Maybe they should be split. Maybe they should be, like, rewritten. Maybe there's new metadata we can extract from those. Like, let's look at the signal of how our application is performing. Let's try to figure out, like, are we remembering the right things or not? Like, the idea that there is going to be, like, a lot of offline compute and inference under the hood that helps make AI systems continuously self-improve is a sure bet.

Alessio [00:42:19]: Part of the sleep time compute thing that we talked about was pre-computing answers. So based on the data that you have, what are likely questions that the person is going to ask? And then can you pre-compute those things? How do you think about that in terms of Chroma? We released a technical report. Yeah.

Jeff [00:42:35]: We released a technical report maybe three months ago. The title is Generative Benchmarking. And the idea there is, like, well, having a golden data set is really powerful. What a golden data set is is you have a list of queries and you have a list of chunks of those queries should result in. And now you can say, okay, this retrieval strategy gives me, for these queries, gives me 80% of those chunks. Whereas if I change the embedding model, now I get 90% of those chunks. That is better. And then you also need to consider cost and speed and API reliability. And other factors, obviously, when making good engineering decisions. But, like, you can measure now, like, changes to your system. And so what we noticed was, like, developers had the data. They had the chunks. They had the answers. But they didn't have the queries. We did a whole technical report around how do you teach an LLM to write good queries from chunks? Because, again, you want, like, chunk query pairs. And so if you have the chunks, you need the queries. Okay. We can have a human do some manual annotation, obviously. But humans are not. They're inconsistent and lazy. And, you know, QA is hard. And so can we teach an LLM how to do that? And so we sort of did a whole technical report and proved a strategy for doing that well. So I think generating QA pairs is really important for benchmarking your retrieval system, golden data set. Frankly, it's also the same data set that you would use to fine-tune in many cases. And so, yeah, there's definitely something, like, very underrated there. Yeah.

Swyx [00:43:58]: I'll throw a plus one on that. I think as much attention as the context rock paper is getting, I feel like generative benchmarking was a bigger aha moment. For me, just because I actually never came across a concept before. And I think, like, actually more people will apply it to their own personal situations. Whereas context rock is just generally, like, yeah, don't trust the models that much. But there's not much you can do about it except do better context engineering. Yeah. Yes. Yes. Whereas generative benchmarking, you're like, yeah, generate your evals. And, you know, part of that is you're going to need the data sets. And it'll sort of fall you into the place of all the best practices that everyone advocates for.

Jeff [00:44:34]: So it's really a nice piece of work. I think having worked in applied machine learning developer tools now for 10 years, like, the returns to a very high-quality small label data set are so high. Everybody thinks you have to have, like, a million examples or whatever. No. Actually, just, like, a couple hundred even, like, high-quality examples is extremely beneficial. And customers all the time, I say, hey, what you should do is say to your team Thursday night, we're all going to be in the conference room. We're ordering pizza. And we're just going to have a data labeling party for a few hours. That's all it takes to bootstrap this.

Swyx [00:45:08]: Google does this. OpenAI does this. Anthopic does this. You are not above doing this. Great. Yeah. Exactly. Yeah. Yeah. Look at your data. Again, it's what matters. Maybe you should classify that as label your data, not look at. Because look at seems a bit too… I agree with that. Yeah. There's some more… View only. Right. I agree with that. Yeah. Read and write. Read and write. While you mentioned it, I should correct myself. It wasn't standard cognition. It was standard cyborg. My favorite fact about you is you're also a cyborg with your leg. True. People… If you see Jeff in person, you should ask him about it. Or maybe not. Maybe don't. I don't know. I don't care. Don't care. Standard cyborg, mighty hive, and know it. What were those lessons there that you're applying to Chroma? Yeah.

Lessons from Previous Startups and Building with Purpose

Jeff [00:45:46]: More than I can count. I mean, it's a bit of a cliche. It's very hard to be self-reflective and honest with yourself about a lot of this stuff. But I think viewing your life as being very short. And kind of a vapor in the wind. And therefore only doing the work that you absolutely love doing. And only doing that work with people that you love spending time with. And serving customers that you love serving. Is a very useful North Star. And it may not be the North Star to print a ton of money in some sense. There may be faster ways to scam people into making $5 million or whatever. So. But if I reflect on. And I'm happy to go into more detail obviously. But if I reflect on my prior experiences. I was always making trade-offs. I was making trade-offs with the people that I was working with. Or I was making trade-offs with the customer that I was serving. Or I was making trade-offs with the technology and how proud I was of it. And maybe it's sort of like an age thing. I don't know. But the older that I get, I just more and more want to do the best work that I can. And I want that work to not just be great work. But I also want it to be seen by the most number of people. Because ultimately that is what impact looks like. Impact is not inventing something great. And then nobody using it. Like impact isn't inventing something great. As many people using it as possible.

Swyx [00:47:01]: Is any of that, you know, and we can skip this question if it's sensitive. But like, is any of that guided by religion, by Christianity? And I only ask this because I think you're one of a growing number of openly, outwardly, positively religious people in the Valley. And I think that it's kind of what I want to explore. You know, I'm not, I'm not like that religious myself, but I just kind of like, how does that inform how you view your impact? Your, you know, your choices that there was a little bit of, of that in what you just said, but I wanted to sort of tease that out more.

Religion and values in Silicon Valley

Jeff [00:47:32]: I think increasingly modern society is nihilist. Nothing matters. It's a absurdist, right? Everything is a farce. Everything is power. Everything's a comedy. Everything's a comedy. A meme. Yeah. Yeah, exactly. And so like, it's very rare. And I'm not saying that I always end the life. Living exemplar of this, but like, it's very rare to meet people that have genuine conviction about what flourishing for humanity looks like. And that's very rare to meet people that are like actually willing to sacrifice a lot to like make that happen and to start things that like they may not actually see complete in their lifetimes. Like it used to be commonplace that people would start projects that would take centuries to complete. Yeah. And you know, now that's like less and less the case.

Swyx [00:48:23]: The image that comes to mind is the Sagrada Familia in Barcelona, which I think was started like 300 years ago and it's completing next year. Yeah.

Jeff [00:48:33]: I've seen it in construction, but I can't wait to see it complete as well. Yeah. I'm sure the places are booked out already. Yeah. Yeah. And so, you know, it's, it's common. There are actually, you know, a lot of like religions in Silicon Valley. I think AGI is also a religion. It has a problem of evil. We don't have enough intelligence. It has a solution, a deus ex machina. It has the second coming of Christ that AGI, the singularity is going to come. It's going to save humanity because we will now have infinite and free intelligence. Therefore, all of our problems will be solved. And, you know, we will live in sort of like the palm of grace for all eternity. It's going to solve death. Right. And so like, I think that like religion still exists in Silicon Valley. I think that it's like, you know, there's a conservation of religion and you kind of can't get rid of it. Yeah. But. The God gene. Yeah. I mean, you know, you have different terms for this. But like, I think that. I'm always skeptical of religions that haven't been around for more than five years. Put it that way. Yeah.

Swyx [00:49:27]: There's a survivorship bias. Anyway, I do think like you're one of the more prominent ones that I know of. And I think you guys are a force for good. And I like to encourage more of that. I don't know. You know, people should believe in something bigger than themselves and build for plant trees under which they will not sit. Am I mangling the quote? Is that actually a biblical quote? I don't think it's a biblical quote. But I like that quote. That's a good one.

Jeff [00:49:52]: So, yeah.

Swyx [00:49:52]: Plus one. Like, I think society is really collapsed when, like, you just live for yourself. That really is true. Agreed.

Alessio [00:49:59]: Who does your design? Because all of your swag is great. Your office looks great. The website looks great. The docs look great. How much of that is your input? How much of that do you have somebody who just gets it? And how important is that to, like, making the brand part of the culture?

Company culture, design, and brand consistency

Jeff [00:50:18]: I think all value, you know, again, going back to the question of, you know, the culture, like, the Conway's Law thing, like, you ship your org chart, you ship what you care about as a founder in some sense. And, like, I do care deeply about this aspect of what we do. And so I think it does come from me in some sense. I can't take at all credit for everything we've done. We've had the opportunity to work with some, like, really talented designers. And we're hiring, as well, for that. So if people, you know, are listening to this and want to apply, please do. I think, I mean, it's cliche to crib Patrick Collison quotes, but he does seem to be, you know, one of the, like, most sort of public embodiers of this idea that, like, how you do, and I'm not sure this is a dark quote from him, to be clear, this is more of just a broad aphorism, but, like, how you do one thing is how you do everything. And just ensuring that there's a consistent experience of what we're doing, where, like you said, if you come to our office, like, it feels intentional and thoughtful. If you go to our website, it feels intentional and thoughtful. The user API, it feels intentional and thoughtful. If you go through an interview process, it feels intentional and purposeful. I think that's so easy to lose. It's just so easy to, like, lose that. And in some ways, the only way that you keep that is by insisting on that that standard remain. And I think that that is, like, one of the main things that, like, I can do really for the company, like, as a leader. It's sort of cringe to say, but, like, you do kind of have to be, like, the curator of taste. It's not that I have to stamp everything that goes out the door before it does, but at a minimum, I can do it. I can do it. And so, like, you know, sometimes, companies, you know, maybe it's not even, like, downhill in quality. It's not sort of legible that any one thing is bad or worse, but it's more like people just have their own expressions of what good looks like. And like, you know, they turn that up to 11 and then, like, the brand becomes incoherent. Like, what does this thing mean? And like, what do they stand for? Again, it's not, there's no longer a single voice. Yeah, I don't, again, I'm not claiming that I'm good, perfect at this or good at this. But we certainly do. We wake up every day and we try.

Swyx [00:52:19]: You have a lot of, it's very powerful that the skill you have to convey like straightforward principles and values and thoughtfulness, I think in everything that you do. Like, yeah, I, you know, I, you know, I've, I've been impressed with your work for a while.

Hiring at Chroma: Designers, Researchers, and Engineers

Alessio [00:52:36]: Anything we're missing? You're hiring designers. Any other roles that you have open that you want people to, to apply for?

Jeff [00:52:42]: If you're a great product designer that wants to work on developer tools, I think we have one of the most kind of unique opportunities at Crema. If you are interested in extending the kind of research that we do, that's also an interesting opportunity. We're always also hiring like very talented engineers that want to work with other people that are very passionate about kind of low level distributed systems and in some ways solving all the hard problems so that application developers don't have to.

Swyx [00:53:07]: When you say that, can you double click on low level distributed systems? People always say this and then like, okay, Rust, like, like, you know, like Linux kernel. What are we talking here?

Jeff [00:53:18]: Yeah. I mean, like, Maybe like, you know, a useful encapsulation of this is like, if you care deeply about things like Rust or deterministic simulation testing or

Swyx [00:53:32]: Raft, Paxos,

Jeff [00:53:33]: TLA Plus, ConsenSys. TLA Plus, really? Wow.

Swyx [00:53:37]: You know, if you just keep them saying these are, these are like proxies for, you would like, you would like the work that we do here. I just really want to tease out the hiring message, but also I, part of my goal is also to try to identify who are, what, what is the type of, what is the type of AI engineer that people, that startups are really trying to hire and they cannot get, because the better we can identify this thing, I can, you know, maybe create like some kind of branding around it, create an event and like, get these, like, there's a, there's a supply side and a demand side and they can't find each other. Yeah. And that's why I put AI engineer together was that, that was, that was part of it. Yeah. But then this distributed systems person, which like I have heard from you and like a hundred other startups, what is the skillset? What are they called? What do they do? And part of that is like, it's part of that is cloud. Cloud engineering, because a lot of times you're just dealing with AWS, a lot of that, a lot of times you're just dealing with, I don't know, debugging network calls and, and, uh, consistency things, if you're doing replication or whatever, um, where do they go? What do they do? Yeah. Yeah. Like, but they don't use TLA plus at work, you know?

Jeff [00:54:36]: Probably not. Yeah. I mean, last year I started like the SF systems group. Yes. The reading group. Um, yeah, there's like presentations and the point of that was like, let's bring, let's create a meeting place. Okay. Care about this topic because like, there wasn't really a place in the Bay area for people to do that. Yeah. Um, so that continues to go now and continues to run, which is great. I mean, to be clear, we have a lot of people on the team who are extremely good at this. And so like, it's not that we have zero, it's that we have six or seven, um, and you went 20, but yeah, it's not that we want more, but we are in some ways, like, I feel like our product roadmap is very obvious and we know exactly what we need to build for the next, even like 18 months. But quality is always a limiting function, quality and focus are always limiting functions and like, well, yes, I will always make my land acknowledgement to mythical man month eventually, like it's good. More people you do, you kind of do need more people because you need more focus. Like you need more people to care deeply about the work that they do. And the AI is certainly an accelerant and it's helpful.

Swyx [00:55:39]: And it's the reason that our team is still very small today relative to many of our competitors is because like, I think we've really embraced like those tools, your cursor shop. Yeah. Yeah.

Jeff [00:55:48]: Code windsurf people use whatever they want. Okay. Yeah. So I think all of those tools get some usage internally so far. Uh, we've still not found that really any AI coding tools are particularly good at rust though. Um, I think I'm not sure why that is other than the obvious. There's just like that, that many examples of great rust on the internet. And so, um, you know, yeah.

Swyx [00:56:08]: Yeah. You, you would think that, you know, rust, uh, errors would be help you debug it itself. Right. You would think. Apparently. Apparently not. Okay. All right. I have zero experience in that, in that front. Yeah. Uh, I have, I've contributed three things to the rest SDK of temporal and that was my total experience with rust. But, uh, I, I think it's definitely on the rise. It's, it's zig, it's rust. And I don't know if there's a third cool languages. I think ghost accounts. Go, go Lang. Yeah. Ghost accounts. If you're in, in those, in that, in that bucket, uh, reach out to Jeff, but, uh, otherwise I think we're good. Thanks for coming on. Thanks for having me guys. Good to see you.

Jeff [00:56:46]: Thank you.

Subscribe to Latent.Space

Thousands of paid subscribers

The AI Engineer newsletter + Top 10 US Tech podcast. Exploring AI UX, Agents, Devtools, Infra, Open Source Models. See https://latent.space/about for highlights from Chris Lattner, Andrej Karpathy, George Hotz, Simon Willison, Soumith Chintala et al!

By subscribing, I agree to Substack's Terms of Use, and acknowledge its Information Collection Notice and Privacy Policy.

[

Ankit Sharma's avatar

](https://substack.com/profile/135452134-ankit-sharma)

[

Alexander Bianchi's avatar

](https://substack.com/profile/236307790-alexander-bianchi)

[

Mark Noone's avatar

](https://substack.com/profile/74837108-mark-noone)

[

Andrew's avatar

](https://substack.com/profile/251130189-andrew)

[

Jacob Offir's avatar

](https://substack.com/profile/100823303-jacob-offir)

48 Likes∙

2 Restacks

48

Share this post

[

Latent.Space

"RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma

](#)

[

1

](https://www.latent.space/p/chroma/comments)

2

[

Share

](javascript:void(0))

Discussion about this post

[

Justin's avatar

](https://substack.com/profile/274200388-justin?utm_source=comment)

Justin

1d

KH with the wins

Expand full comment

[

Like

](javascript:void(0))

Reply

Share

The 2025 AI Engineer Reading List

We picked 50 paper/models/blogs across 10 fields in AI Eng: LLMs, Benchmarks, Prompting, RAG, Agents, CodeGen, Vision, Voice, Diffusion, Finetuning. If…

Dec 27, 2024

563

Share this post

[

Latent.Space

The 2025 AI Engineer Reading List

](#)

[

18

](https://www.latent.space/p/2025-papers/comments)

o1 isn’t a chat model (and that’s the point)

How Ben Hylak turned from ol pro skeptic to fan by overcoming his skill issue.

Jan 12 •

Ben Hylak

and

Latent.Space

408

Share this post

[

Latent.Space

o1 isn’t a chat model (and that’s the point)

](#)

[

34

](https://www.latent.space/p/o1-skill-issue/comments)

The Rise of the AI Engineer

Emergent capabilities are creating an emerging job title beyond the Prompt Engineer.

Jun 30, 2023

452

Share this post

[

Latent.Space

The Rise of the AI Engineer

](#)

[

18

](https://www.latent.space/p/ai-engineer/comments)

Ready for more?

Read the whole story

bogorad

1 day ago

reply

Barcelona, Catalonia, Spain

borislvin | ни с кем не в ногу
Saturday August 23^rd, 2025 at 4:29 AM

borislvin

Кто/что/когда/где/почему: Комментарий из сети задаёт вопрос, с какого момента инициирование войны стало априорно неприемлемым, приводя исторические примеры (1858–1859, 1912) и отмечая смену представлений после Первой мировой.
Исторические примеры позитивной оценки войн: Пломбьерская встреча Наполеона III и Кавура (1858) и война 1859 года привели к объединению Италии и традиционно оцениваются положительно.
Балканский пример: Первая балканская война (1912) союзом Болгарии, Греции, Сербии и Черногории против Турции привела к потере Македонией Турцией и в историографии в основном оценивается положительно.
Перелом после Первой мировой: После Первой мировой утвердилось представление, что принцип самоопределения не может оправдывать инициацию войны, и это представление с течением времени усилилось.
Обратная сторона приоритета мира: Признание приоритета сохранения мира часто приводит к тотальному отрицанию самого принципа самоопределения, объявлению его неправильным или запретным.
Искажение понимания и последствия: Сепаратизм отождествляют с войной, а защита мира — с отрицанием сепаратизма, что принципиально искажает природу происходящего и ведёт к противоположным результатам.
Личные примеры и позиции: Автор последние 40 лет поддерживал распады СССР, Чехословакии и Югославии, приветствовал стремления Чечни, референдумы в Квебеке и Шотландии, осуждал действия Испании в Каталонии и блокаду Курдистана.
Принципиальная позиция и одиночество: Эмоционально реакции варьируют, но рационально автор всегда положительно относится к сепаратизму как противостоянию диктатуре и войне; это ключевой принцип его мировоззрения последние 40–45 лет, отчуждающий его от большинства.

[из комментов к чужой подзамочной записи]

На самом деле вы затронули очень важный вопрос, причем я не знаю, задумывался ли о нем кто-либо вообще, есть ли литература и т.д. Я много думаю об этом, но отчетливой ясности у меня пока тоже нет.

Вопрос такой: с какого момента инициирование войны стало ощущаться как априорно неприемлемый (вредный, преступный и т.д.) акт, несмотря на какие угодно потенциально оправдывающие мотивы?

До какого-то момента это было не так. Простейшие, лежащие прямо на поверхности примеры:

- 1858 год - Наполеон III и Кавур тайно встречаются в Пломбьере и договариваются о планах войны против Австрии. В 1859 году Пьемонт сознательно провоцирует (фактически развязывает) войну, Франция присоединяется и дальше все известно - Сольферино, Виллафранка, Гарибальди, Сицилия, Неаполь, объединение Италии. Традиционная оценка историков, тем более итальянских - положительная.

- 1912 год - Болгария, Греция, Сербия, Черногория формируют союз и открывают войну против Турции (первую балканскую). Турция побеждена, теряет Македонию. Традиционная оценка историков, прежде всего балканских - в основном положительная, осуждается разве что плохое согласование планов, приведшее к войне между победителями (вторая балканская война).

Но уже после первой мировой начинает складываться отчетливое представление о том, что принцип самоопределения не может оправдывать инициирование войны. И чем дальше, тем это представление сильнее.

Оборотная сторона этой проблемы в том, что признание приоритетности сохранения мира очень часто (если не всегда) приводит к тотально ошибочному - то есть намного хуже, чем преступному - отрицанию самого принципа самоопределения.

Вместо того, чтобы быть признаваемым как важный, хотя бы даже уступающий по важности принципу сохранения мира, он сплошь и рядом объявляется неправильным, вредным, запретным.

В итоге сепаратизм и самоопределение отождествляются с войной, а защита мира отождествляется с отрицанием сепаратизма, что принципиально, фундаментально искажает понимание всей природы происходящего и в конечном итоге приводит к результатам, ровно противоположным тем, на которые надеются наивные отрицатели принципа самоопределения.

* * *

Да, про меня именно так и можно сказать. Последние сорок лет я придерживаюсь абсурдной иерархии ценностей, я не спал ночей и мечтал о распаде СССР, Чехословакии и Югославии. Потом я мечтал о независимости Чечни, радовался тому, что Канада и Англия разрешили референдумы в Квебеке и Шотландии, возмущался тем, что Испания провела военный переворот в Каталонии, а Ирак и Турция устроили блокаду Курдистана. Поэтому со мной не о чем разговаривать, поэтому со мной никто не разговаривает.

* * *

Мое эмоциональное отношение к эвентуальным победам сепаратистов на мирных референдумах может варьировать от радости (в 99 процентах случаев) до сожаления, но мое рациональное, аналитическое отношение к сепаратизму как явлению всегда и безусловно положительное. Сепаратизм, сецессионизм, самоопределение - это явления, которые концептуально и содержательно противостоят диктатуре, войне, разрушениям, убийствам.

Точно так же, как я могу эмоционально радоваться тому, что автор Икс меньше публикует своих гнусных писаний, но концептуально, рационально, аналитически я всегда буду против любой цензуры.

* * *

Да, это то, что является одним из главных предметов моих интересов на протяжении последних сорока пяти лет, и одним из главных принципов моего мировоззрения - как минимум последних сорока лет. Опыт этих сорока лет бесспорно убедил меня в том, что я иду не в ногу примерно со всеми.

Read the whole story

bogorad

1 day ago

reply

Barcelona, Catalonia, Spain

Google's Mixture Of Recursions : End of Transformers | by Mehul Gupta | in Data Science in Your Pocket - Freedium
Friday August 22^nd, 2025 at 6:32 PM

Who/What/When/Where/Why: Google DeepMind introduced "Mixture-of-Recursions" (MoR), reported by Mehul Gupta on Medium on July 22, 2025 (updated July 23, 2025), to enable per-token adaptive recursion depth and address Transformer inefficiencies.
Recursive Transformer: A Transformer variant that reuses the same set of layers repeatedly (weight tying) instead of stacking many unique layers.
Mixture-of-Recursions (MoR): A recursive Transformer that assigns each token a learned recursion depth via a tiny router, allowing tokens to continue or exit per-step and caching KV only for active tokens.
Primary benefits: Parameter reuse, adaptive compute per token, selective KV caching, reduced memory and FLOPs, and potential for equal or better performance with smaller models.
Benchmarks: A 118M MoR model outperformed a 315M vanilla Transformer on few-shot accuracy, trained under the same FLOP budget (16.5e18) with ~25% less memory and up to ~2.06× inference speedup depending on recursion depth.
Routing modes: "Expert-choice" selects tokens per recursion step (dynamic, needs auxiliary loss to prevent causality leaks); "Token-choice" assigns a depth once at the start (simpler but less adaptive).
KV cache strategies: Recursion-wise caching stores KV only for active tokens; recursive KV sharing caches once on the first recursion and reuses it later (memory-efficient but can reduce accuracy).
Limitations: Token-choice is less flexible, KV sharing can harm accuracy, routing capacity is fixed after training, MoR underperforms on very small models (≈135M or below), and substantial engineering effort is required for practical integration.

< Go to the original

Preview image

Google's Mixture Of Recursions : End of Transformers

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation explained

Mehul Gupta

[ Data Science in Your Pocket

Data Science in Your Pocket

](https://medium.com/data-science-in-your-pocket "We are on Youtube: https://www.youtube.com/channel/UCQoNosQT")a11y-light ~7 min read · July 22, 2025 (Updated: July 23, 2025) · Free: Yes

It's high time that we have another architecture apart from Transformers that helps us build AI models, specifically LLMs. And now Google DeepMind has released a new paper that looks to be revolutionary, Mixture Of Recursion. You might confuse it with related to Mixture Of Experts, but it is more related to Transformers architecture.

[

Subscribe to datasciencepocket on Gumroad

On a Mission to teach AI to everyone !

<a href="http://gumroad.com" rel="nofollow">gumroad.com</a>

](https://datasciencepocket.gumroad.com/)

Problems with Transformer architecture

None

Transformers was introduced way back. Obviously, now we have been able to figure out some problems with this architecture, specifically in terms of efficiency.

Uniform compute for all tokens: Every token, no matter how trivial or complex, goes through the full stack of layers — leading to wasted computation on easy tokens and underutilization for harder ones.
Excessive parameter count: Transformers allocate separate weights to each layer, causing massive model size and making training and deployment resource-intensive.
Inefficient inference: Tokens can't exit early, leading to unnecessary computation and increased latency during decoding — even when some tokens are already "done."
KV cache memory bottleneck: Key-value pairs are stored for all tokens at all layers, rapidly consuming GPU memory and limiting long-context usage.
Lack of latent reasoning: There's no mechanism for iterative refinement of token representations; Transformers rely on a single forward pass without internal reasoning loops.
Fixed computational budget: There's no built-in flexibility to adapt compute dynamically during training or inference, regardless of input complexity or token importance.
No test-time adaptability: Standard Transformers can't adjust depth or compute on-the-fly at inference time without retraining or architectural hacks.

Mixture of recursions is here to solve all of these one by one.

But before we jump onto Mixture of Recursions, we need to understand.

What is a Recursive Transformer?

None

A Recursive Transformer is a version of the Transformer architecture where the same set of layers is applied repeatedly, like a loop, instead of having a tall stack of different layers.

In a standard Transformer, you might have 24 unique layers stacked on top of each other. In a recursive transformer, you might just have 6 layers, and apply them 4 times in a row. Same layers. Same weights. Just repeated.

Think of it like this:

Vanilla Transformer: Layer 1 ? Layer 2 ? Layer 3 ? … ? Layer 24 (each one different)

Recursive Transformer: Block A ? Block A ? Block A ? … (same block, reused N times)

It's like using a loop in code instead of copy-pasting the same function 24 times. Saves memory (fewer parameters), and if trained right, can still match or even beat performance of larger models.

Why would anyone do this?

Memory and compute costs grow fast with model depth.
Many layers in Transformers do similar things.
Reusing layers (i.e., recursion) makes the model smaller, cheaper, and more efficient — if you can make it work.

Recursive Transformers exploit that by tying the weights and looping through them.

Now, what MoR (Mixture-of-Recursions) adds on top is deciding how many times to recurse per token, instead of doing the same number of loops for every word.

What is Mixture of Recursions ?

None

Mixture-of-Recursions (MoR) is a recursive Transformer. But instead of looping the same number of times for all tokens, M_oR assigns each token a different recursion depth depending on how much "thinking" it needs_. The router, a tiny neural net, decides this, token by token, during training and inference.

None

Each recursion step runs a shared stack of layers. The token either continues through another round or exits, based on its own signal. This means complex tokens go deeper, simple ones stop early. No post-hoc logic. No second-phase tuning. It's all end-to-end.

Everything adjusts to this structure: layers, attention, even the key-value (KV) cache. KV storage only happens for tokens still active, not all tokens.

None

Small Analogy

Imagine you're building a big Lego castle. Some parts are easy, you snap them together quickly. But some parts are tricky, you need to try again and again to get them right.

Mixture-of-Recursions is like a smart helper that looks at each Lego piece and says: "Hmm, this one's easy, let's do it once." Or, "This one's tricky, let's work on it a few more times."

So instead of doing everything the same way, it spends more time on the hard parts and less on the easy ones. That way, it finishes faster and smarter.

Hope this helps you understand the core idea

Why this matters?

Modern Transformers waste compute. Every token, no matter how trivial gets the same treatment: same number of layers, same depth, same memory. That's inefficient, especially when you're trying to scale LLMs to consumer-level devices.

MoR fixes that:

It shares layers (parameter reuse), fewer unique weights.
It routes tokens differently (adaptive compute) , fewer wasted FLOPs.
It caches only what's needed (selective KV), less memory.

Bottom line: smaller models, cheaper compute, less memory. Same or better performance.

Benchmarks

None

A 118M MoR model beat a 315M vanilla Transformer on few-shot accuracy.
MoR trained under the same FLOP budget — 16.5e18 — but with 25% less memory.
Inference can speed up 2.06× depending on recursion depth.

Why it works

Multiple reasons:

1. Weight Sharing : MoR reuses the same stack of layers, like Universal Transformers. But here, it decides per token how many times to reuse it.

2. Dynamic Routing: A learned router figures out per-token recursion depth. Unlike fixed-depth loops or hard-coded early exits, this is learned during training.

3. KV Cache Optimization: Only cache tokens still in the game. The rest have exited. Reduces memory traffic. There's also an option to reuse the first cache across all steps — saves even more memory at some cost to accuracy.

What is Routing ?

None

MoR uses a tiny neural router to figure out: should this token go for another round or exit?

Two ways to do this:

Expert-choice: Each recursion step acts like a gatekeeper. It looks at the tokens and picks a few that deserve another pass. Others get kicked out. This is dynamic, learned during training. But it risks seeing future info (causality leak), so they slap an auxiliary loss on it to keep things clean.
Token-choice: Every token picks its recursion depth once at the start. Done. No decision-making at each step. Simpler, more predictable. But performance takes a hit it doesn't adapt as well.

The router in Mixture-of-Recursions learns to assign recursion depths to tokens during pretraining, and it's trained together with the rest of the model. But how it learns depends on which routing mode you use.

So yeah, MoR can loop smart. But how smart it loops depends on who's doing the routing.

The KV Cache Problem

Every time a Transformer layer runs, it stores key-value (KV) pairs for attention. Multiply that by all tokens, all layers and boom your memory's gone.

MoR solves this in two ways:

Recursion-wise caching: Only cache KV pairs for tokens that are still "in the game." If a token exits after 2 recursions, we stop caching it after that. Saves memory, big time.
Recursive KV sharing: Cache everything once during the first recursion. Then reuse across future steps. It's cheaper, but less accurate. Still, if you're low on GPU juice, this is gold.

Either way, MoR is ruthless with cache. No freeloaders.

Mixture-of-Recursion vs Mixture-of-Experts (MoE)

Don't mix them up. They are very different

MoE picks different experts : scales width.

MoR chooses recursion depth : scales depth.

MoE is like picking from a menu. MoR is like cooking one dish longer for harder ingredients. MoE adds more parts. MoR reuses the same part, more smartly.

Some problems with Mixture-of-Recursions

Let's not pretend this thing is flawless.

Token-choice routing is too blunt. It works, but it's like setting a timer and walking away. No flexibility mid-run.
KV sharing hurts performance in expert-choice. You get the memory savings, sure but not without a small hit to accuracy.
Routing capacity is frozen post-training. Once the router learns how many tokens to pick at each step, good luck tweaking it later.
Doesn't shine on tiny models (135M or below).
Needs engineering. You're not going to plug this into HuggingFace and be done.

Still, for what it promises adaptive depth, lighter memory, same or better performance, quite compelling

End Notes

Mixture-of-Recursions isn't just some academic detour, it's a real attempt to rethink how models should spend their time and energy. Instead of brute-forcing every token through 24 layers like it's gospel, it pauses and asks: Does this word even need that much thought? That's a shift. Not a tweak.

If this catches on, we might finally break away from the transformer monoculture. Smaller models that think deeper, not wider. Efficient where it matters. And weirdly, it's not trying to replace the transformer, it's just making it less wasteful.

Not every model needs to be bigger. Some just need to be smarter with how they loop.

#ai #machine-learning #data-science #technology #deep-learning

Read the whole story

bogorad

1 day ago

reply

Barcelona, Catalonia, Spain

The Pixel 10 Is Proof Google Beats Apple on Smartphone AI - WSJ
Thursday August 21^st, 2025 at 4:36 AM

Who/What/When/Where/Why: Nicole Nguyen of the WSJ provides a hands‑on review of Google’s Pixel 10, announced by Google and available Aug. 28, evaluating its AI features and comparing them to Apple and Samsung.
Magic Cue: Magic Cue proactively surfaces contextual information from inboxes, texts and calendars—examples include showing flight reservations during calls and offering calendar shortcuts when scheduling.
Voice Translate: Live Voice Translate offers real‑time translation with a cloned version of the user’s voice and on‑screen transcription across multiple languages (English, Spanish, German, Japanese, etc.).
Camera Coach: Camera Coach uses AI to suggest composition and photographer instructions, and Ask Photos lets users apply generative edits like backgrounds or virtual clothing.
Fitness Coach: A Gemini‑powered health and fitness coach for Fitbit and Pixel Watch personalizes workouts and recommendations using real‑time data such as sleep and user‑reported issues.
Google Ecosystem: Many Pixel AI features depend on Google apps (Gmail, Maps, Phone) and are designed to surface information at the moment it’s needed.
Market Context: Google and Samsung are rapidly integrating advanced AI into phones, while Apple has teased AI improvements but currently offers fewer available AI features.
Limitations: Noted limitations include imperfect translations, dependence on Google’s ecosystem, Pixel’s small market share, and inability to convert iMessage green bubbles to blue.

The race to develop the killer AI-powered phone is on. But Apple MMAAPLMM is getting lapped by its Android competitors.

Apple teased a smarter Siri but it’s MIA, and other Apple Intelligence offerings are meh. Meanwhile, Samsung is fusing Gemini into its Galaxy phones, and the new Google Pixels are chock-full of AI this and AI that. Tools we’d actually use.

The coming Pixel 10, announced on Wednesday by Alphabet MMGOOGLMM subsidiary Google and available Aug. 28, dressed me in an AI-generated blazer right in the camera app. A convincing clone of my voice fluently discussed lunch in German, which I don’t speak. When I called United customer service, flight reservation information automatically appeared on screen.

The Pixel holds just a fraction of the smartphone market—and that’s unlikely to change, given how attached we are to our mobile devices—but it’s leagues ahead of the iPhone in AI. In a recent ad, Google mocked Apple’s smart-Siri delay, suggesting iPhone owners change to the new Pixel 10.

Regardless of which side you’re on, don’t we all just want to know what AI can really do for us on a phone? After I checked out the Pixel 10, I have an answer: information that appears right when you need it, real-time translation in your own voice, a virtual photographer directing your shots, a personalized fitness coach and more.

What can’t it do? Turn those iPhone green text bubbles into blue ones.

Google is introducing new devices, including the Pixel 10 and Pixel 10 Pro, that come with useful AI-powered software. PHOTO:

No prompt necessary

Google Pixel phones have always been more about wow-inducing software than hardware—and that includes the new Pixel 10 ($799 and up) and Pixel 10 Pro ($999 and up). But you have to rely heavily on Google’s own apps, like Gmail and Maps.

For iPhone users including me, the most jealousy-inducing feature is Magic Cue. It rifles through your inbox, calendar and texts, then surfaces information when it thinks you need it.

Say Mary texts: “What’s that coffee shop Ben recommended?” Magic Cue can surface the recommendation from your conversation with Ben. If Mary then asks whether you want to try it on Sunday, a shortcut to view your calendar will appear.

Magic Cue rifles through your inbox, texts and more and surfaces information when it thinks it’s relevant. One example: flight-reservation details when you call the airline. PHOTO:

When you call a restaurant, the phone app can pull up reservation details from your email. When you open Google Maps just before the reservation, navigating to the restaurant takes only a quick tap.

Voice Translate also ups the wow. This live language translator (with real-time voice clone) is similar to the Meet function I tested earlier this year. On the Pixel, it works right in the phone app, translating English, Spanish, German, Japanese, Italian, Portuguese, French, Swedish, Russian, Hindi and Indonesian.

I tried it with a German speaker, choosing his preferred language. I spoke English and, after a slight delay, heard my own voice speaking German. A transcription of our conversation, in my native English, appeared on screen.

The German-to-English translation wasn’t perfect, but I always understood the gist. I could have used this tool when I lived in France, struggling with administrative tasks as a non-native speaker—like convincing my landlord the water heater was broken.

Art director

The Pixel 10’s photo experience is infused with AI. The Camera Coach is actually unsettling at first. A Google representative pointed the camera at me and hit the AI camera button. After about 10 seconds, it asked what we wanted in the photo: a full-body portrait, a close-up or some more novel plan.

We tapped “get inspired” and it generated a rough guide image of me, sitting more relaxed on the sofa. Then it gave the photographer some instructions: Have me sit down, place me on the left side of the frame, move to capture the scene lower and at an angle, use Portrait Mode, then take the shot from my waist up.

The final photo looked pretty good. Maybe something I could use on LinkedIn. But did it convey the right seriousness?

In editing mode, you can tap Ask Photos then type or say instructions. “Make it look better” might touch up the photo, but I went with “Make it look professional”: It brightened the lighting and turned up the blur. It gave me four options in around 20 seconds.

“Send Nicole to outer space” changed the background to the Milky Way. “Add a business suit” put me in a virtual blazer. Though some variations made me look a little ragged, one result was convincing.

I was actually more into Google’s other Gemini-powered coach, launching in October: personalized health and fitness insights for Fitbit trackers and Pixel Watches. The health coach can adjust workout plans based on real-time data, such as last night’s sleep. Mention back pain during a check-in, and the coach will change its suggestions.

The Apple Watch’s coming Workout Buddy is less AI coach, and more AI hype person. It can tell you when you hit a personal best, but it can’t craft a workout for you.

Google says Pixel’s advanced AI features can “make magic happen.” Samsung prominently labels its phones with “Galaxy AI.” Apple’s website highlights “AI-opening possibilities.”

People aren’t demanding AI features in their phones just yet, says Sheng Win Chow, an analyst at Canalys, which tracks smartphone sales. But Google is betting they soon will. The race continues and for now, Apple has a lot of catching up to do.

Write to Nicole Nguyen at nicole.nguyen@wsj.com

Read the whole story

bogorad

3 days ago

reply

Barcelona, Catalonia, Spain

Safe Is What We Call Things Later - by Scott Werner Sunday August 24th, 2025 at 9:08 AM

Google did more than just switch to TSMC for Tensor G5 Sunday August 24th, 2025 at 6:08 AM

"RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma Saturday August 23rd, 2025 at 4:39 AM

Share this post

"RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma

What actually matters in vector databases in 2025, why “modern search for AI” is different, and how to ship systems that don’t rot as context grows.

Share this post

Show Notes

Timestamps

Transcript

Subscribe to Latent.Space

Share this post

Discussion about this post

Share this post

Share this post

Share this post

borislvin | ни с кем не в ногу Saturday August 23rd, 2025 at 4:29 AM

Google's Mixture Of Recursions : End of Transformers | by Mehul Gupta | in Data Science in Your Pocket - Freedium Friday August 22nd, 2025 at 6:32 PM

Google's Mixture Of Recursions : End of Transformers

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation explained

Subscribe to datasciencepocket on Gumroad

On a Mission to teach AI to everyone !

Problems with Transformer architecture

Mixture of recursions is here to solve all of these one by one.

What is a Recursive Transformer?

Why would anyone do this?

What is Mixture of Recursions ?

Small Analogy

Why this matters?

Benchmarks

Why it works

What is Routing ?

Two ways to do this:

The KV Cache Problem

Mixture-of-Recursion vs Mixture-of-Experts (MoE)

Some problems with Mixture-of-Recursions

End Notes

The Pixel 10 Is Proof Google Beats Apple on Smartphone AI - WSJ Thursday August 21st, 2025 at 4:36 AM

No prompt necessary

Art director

Safe Is What We Call Things Later - by Scott Werner
Sunday August 24^th, 2025 at 9:08 AM

Google did more than just switch to TSMC for Tensor G5
Sunday August 24^th, 2025 at 6:08 AM

"RAG is Dead, Context Engineering is King" — with Jeff Huber of Chroma
Saturday August 23^rd, 2025 at 4:39 AM

borislvin | ни с кем не в ногу
Saturday August 23^rd, 2025 at 4:29 AM

Google's Mixture Of Recursions : End of Transformers | by Mehul Gupta | in Data Science in Your Pocket - Freedium
Friday August 22^nd, 2025 at 6:32 PM

The Pixel 10 Is Proof Google Beats Apple on Smartphone AI - WSJ
Thursday August 21^st, 2025 at 4:36 AM