Vanishing Gradients
Episode Archive
Episode Archive
61 episodes of Vanishing Gradients since the first episode, which aired on February 16th, 2022.
- 
     Episode 61: The AI Agent Reliability Cliff: What Happens When Tools Fail in ProductionOctober 16th, 2025 | 28 mins 4 secsagents, ai, machine learning, mlopsMost AI teams find their multi-agent systems devolving into chaos, but ML Engineer Alex Strick van Linschoten argues they are ignoring the production reality. In this episode, he draws on insights from the LLM Ops Database (750+ real-world deployments then; now nearly 1,000!) to systematically measure and engineer constraint, turning unreliable prototypes into robust, enterprise-ready AI. 
- 
     Episode 60: 10 Things I Hate About AI Evals with Hamel HusainSeptember 30th, 2025 | 1 hr 13 minsai, data science, evals, genai, llms, machine learningMost AI teams find "evals" frustrating, but ML Engineer Hamel Husain argues they’re just using the wrong playbook. In this episode, he lays out a data-centric approach to systematically measure and improve AI, turning unreliable prototypes into robust, production-ready systems. 
- 
     Episode 59: Patterns and Anti-Patterns For Building with AISeptember 24th, 2025 | 47 mins 37 secsai, data science, machine learningJohn Berryman has seen what works — and what breaks — when building AI applications. In this episode, he shares the “seven deadly sins” of LLM development and the fixes that keep projects from falling apart. From context management to retrieval debugging, John explains the patterns he’s seen succeed, the mistakes to avoid, and why it helps to think of an LLM as an “AI intern” rather than an all-knowing oracle. 
- 
     Episode 58: Building GenAI Systems That Make Business Decisions with Thomas Wiecki (PyMC Labs)September 10th, 2025 | 1 hr 45 secsWhile most conversations about generative AI focus on chatbots, Thomas Wiecki (PMC Labs, PyMC) has been building systems that help companies make actual business decisions. In this episode, he shares how Bayesian modeling and synthetic consumers can be combined with LLMs to simulate customer reactions, guide marketing spend, and support strategy. Drawing from his work with Colgate and others, Thomas explains how to scale survey methods with AI, where agents fit into analytics workflows, and what it takes to make these systems reliable. 
- 
     Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)August 29th, 2025 | 41 mins 27 secsagents, llms, machine learning, ragWhile many people talk about “agents,” Shreya Shankar (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply. Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines 
- 
     Episode 56: DeepMind Just Dropped Gemma 270M... And Here’s Why It MattersAugust 15th, 2025 | 45 mins 40 secsgenai, llms, machine learningWhile much of the AI world chases ever-larger models, Ravin Kumar (Google DeepMind) and his team build across the size spectrum, from billions of parameters down to this week’s release: Gemma 270M, the smallest member yet of the Gemma 3 open-weight family. At just 270 million parameters, a quarter the size of Gemma 1B, it’s designed for speed, efficiency, and fine-tuning. We explore what makes 270M special, where it fits alongside its billion-parameter siblings, and why you might reach for it in production even if you think “small” means “just for experiments.” 
- 
     Episode 55: From Frittatas to Production LLMs: Breakfast at SciPyAugust 13th, 2025 | 38 mins 8 secsdata science, generative ai, llm, machine learning, scipyTraditional software expects 100% passing tests. In LLM-powered systems, that’s not just unrealistic — it’s a feature, not a bug. Eric Ma leads research data science in Moderna’s data science and AI group, and over breakfast at SciPy we explored why AI products break the old rules, what skills different personas bring (and miss), and how to keep systems alive after the launch hype fades. 
- 
     Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and InferenceJuly 19th, 2025 | Season 1 | 41 mins 17 secsai, compute, genai, llmColab is cozy. But production won’t fit on a single GPU. Zach Mueller leads Accelerate at Hugging Face and spends his days helping people go from solo scripts to scalable systems. In this episode, he joins me to demystify distributed training and inference — not just for research labs, but for any ML engineer trying to ship real software. 
- 
     Episode 53: Human-Seeded Evals & Self-Tuning Agents: Samuel Colvin on Shipping Reliable LLMsJuly 8th, 2025 | Season 1 | 44 mins 49 secsDemos are easy; durability is hard. Samuel Colvin has spent a decade building guardrails in Python (first with Pydantic, now with Logfire), and he’s convinced most LLM failures have nothing to do with the model itself. They appear where the data is fuzzy, the prompts drift, or no one bothered to measure real-world behavior. Samuel joins me to show how a sprinkle of engineering discipline keeps those failures from ever reaching users. 
- 
     Episode 52: Why Most LLM Products Break at Retrieval (And How to Fix Them)July 3rd, 2025 | Season 1 | 28 mins 38 secsai, data science, genai, llms, machine learningMost LLM-powered features do not break at the model. They break at the context. So how do you retrieve the right information to get useful results, even under vague or messy user queries? In this episode, we hear from Eric Ma, who leads data science research in the Data Science and AI group at Moderna. He shares what it takes to move beyond toy demos and ship LLM features that actually help people do their jobs. 
- 
     Episode 51: Why We Built an MCP Server and What Broke FirstJune 27th, 2025 | Season 1 | 47 mins 41 secsai, data science, llms, machine learningWhat does it take to actually ship LLM-powered features, and what breaks when you connect them to real production data? In this episode, we hear from Philip Carter — then a Principal PM at Honeycomb and now a Product Management Director at Salesforce. In early 2023, he helped build one of the first LLM-powered SaaS features to ship to real users. More recently, he and his team built a production-ready MCP server. 
- 
     Episode 50: A Field Guide to Rapidly Improving AI Products -- With Hamel HusainJune 17th, 2025 | Season 1 | 27 mins 42 secsai, data science, evals, llms, machine learningHugo talks with Hamel Hussain (ex-Airbnb, GitHub, DataRobot) about how to improve AI products through evaluation, error analysis, and iteration. They discuss why most teams overlook debugging LLM systems, how to prioritize what to fix, and why evals are not just metrics—but a full development process. 
- 
     Episode 49: Why Data and AI Still Break at Scale (and What to Do About It)June 5th, 2025 | Season 1 | 1 hr 21 minsai, data science, llms, machine learningHugo talks with Akshay Agrawal (Marimo, ex-Google Brain, Netflix, Stanford) about why data and AI systems still break at scale—and what it takes to fix them. They dive into the limits of existing workflows, the importance of reproducibility and reactive execution, and how Marimo reimagines notebooks for modern software development. 
- 
     Episode 48: HOW TO BENCHMARK AGI WITH GREG KAMRADTMay 23rd, 2025 | Season 1 | 1 hr 4 minsagi, ai, data science, machine learningHugo talks with Greg Kamradt, President of the ARC Prize Foundation, about ARC-AGI: a benchmark built on Francois Chollet’s definition of intelligence as “the efficiency at which you learn new things.” Unlike most evals that focus on memorization or task completion, ARC is designed to measure generalization—and expose where today’s top models fall short. 
- 
     Episode 47: The Great Pacific Garbage Patch of Code Slop with Joe ReisApril 7th, 2025 | Season 1 | 1 hr 19 minsai, data science, genai, llms, machine learning, vibe codingWhat if the cost of writing code dropped to zero — but the cost of understanding it skyrocketed? In this episode, Hugo sits down with Joe Reis to unpack how AI tooling is reshaping the software development lifecycle — from experimentation and prototyping to deployment, maintainability, and everything in between. 
- 
     Episode 46: Software Composition Is the New Vibe CodingApril 3rd, 2025 | Season 1 | 1 hr 8 minsai, data science, genai, llms, machine learning, vibe codingWhat if building software felt more like composing than coding? In this episode, Hugo and Greg explore how LLMs are reshaping the way we think about software development—from deterministic programming to a more flexible, prompt-driven, and collaborative style of building. It’s not just hype or grift—it’s a real shift in how we express intent, reason about systems, and collaborate across roles. Hugo speaks with Greg Ceccarelli—co-founder of SpecStory, former CPO at Pluralsight, and Director of Data Science at GitHub—about the rise of software composition and how it changes the way individuals and teams create with LLMs. 
