Vanishing Gradients

Episode Archive

Episode Archive

61 episodes of Vanishing Gradients since the first episode, which aired on February 16th, 2022.

  • Poster Image

    Episode 61: The AI Agent Reliability Cliff: What Happens When Tools Fail in Production

    October 16th, 2025  |  28 mins 4 secs
    agents, ai, machine learning, mlops

    Most AI teams find their multi-agent systems devolving into chaos, but ML Engineer Alex Strick van Linschoten argues they are ignoring the production reality. In this episode, he draws on insights from the LLM Ops Database (750+ real-world deployments then; now nearly 1,000!) to systematically measure and engineer constraint, turning unreliable prototypes into robust, enterprise-ready AI.

  • Poster Image

    Episode 60: 10 Things I Hate About AI Evals with Hamel Husain

    September 30th, 2025  |  1 hr 13 mins
    ai, data science, evals, genai, llms, machine learning

    Most AI teams find "evals" frustrating, but ML Engineer Hamel Husain argues they’re just using the wrong playbook. In this episode, he lays out a data-centric approach to systematically measure and improve AI, turning unreliable prototypes into robust, production-ready systems.

  • Poster Image

    Episode 59: Patterns and Anti-Patterns For Building with AI

    September 24th, 2025  |  47 mins 37 secs
    ai, data science, machine learning

    John Berryman has seen what works — and what breaks — when building AI applications. In this episode, he shares the “seven deadly sins” of LLM development and the fixes that keep projects from falling apart. From context management to retrieval debugging, John explains the patterns he’s seen succeed, the mistakes to avoid, and why it helps to think of an LLM as an “AI intern” rather than an all-knowing oracle.

  • Poster Image

    Episode 58: Building GenAI Systems That Make Business Decisions with Thomas Wiecki (PyMC Labs)

    September 10th, 2025  |  1 hr 45 secs

    While most conversations about generative AI focus on chatbots, Thomas Wiecki (PMC Labs, PyMC) has been building systems that help companies make actual business decisions. In this episode, he shares how Bayesian modeling and synthetic consumers can be combined with LLMs to simulate customer reactions, guide marketing spend, and support strategy.

    Drawing from his work with Colgate and others, Thomas explains how to scale survey methods with AI, where agents fit into analytics workflows, and what it takes to make these systems reliable.

  • Poster Image

    Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)

    August 29th, 2025  |  41 mins 27 secs
    agents, llms, machine learning, rag

    While many people talk about “agents,” Shreya Shankar (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply.

    Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines

  • Poster Image

    Episode 56: DeepMind Just Dropped Gemma 270M... And Here’s Why It Matters

    August 15th, 2025  |  45 mins 40 secs
    genai, llms, machine learning

    While much of the AI world chases ever-larger models, Ravin Kumar (Google DeepMind) and his team build across the size spectrum, from billions of parameters down to this week’s release: Gemma 270M, the smallest member yet of the Gemma 3 open-weight family. At just 270 million parameters, a quarter the size of Gemma 1B, it’s designed for speed, efficiency, and fine-tuning.

    We explore what makes 270M special, where it fits alongside its billion-parameter siblings, and why you might reach for it in production even if you think “small” means “just for experiments.”

  • Poster Image

    Episode 55: From Frittatas to Production LLMs: Breakfast at SciPy

    August 13th, 2025  |  38 mins 8 secs
    data science, generative ai, llm, machine learning, scipy

    Traditional software expects 100% passing tests. In LLM-powered systems, that’s not just unrealistic — it’s a feature, not a bug. Eric Ma leads research data science in Moderna’s data science and AI group, and over breakfast at SciPy we explored why AI products break the old rules, what skills different personas bring (and miss), and how to keep systems alive after the launch hype fades.

  • Poster Image

    Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference

    July 19th, 2025  |  Season 1  |  41 mins 17 secs
    ai, compute, genai, llm

    Colab is cozy. But production won’t fit on a single GPU. Zach Mueller leads Accelerate at Hugging Face and spends his days helping people go from solo scripts to scalable systems. In this episode, he joins me to demystify distributed training and inference — not just for research labs, but for any ML engineer trying to ship real software.

  • Poster Image

    Episode 53: Human-Seeded Evals & Self-Tuning Agents: Samuel Colvin on Shipping Reliable LLMs

    July 8th, 2025  |  Season 1  |  44 mins 49 secs

    Demos are easy; durability is hard. Samuel Colvin has spent a decade building guardrails in Python (first with Pydantic, now with Logfire), and he’s convinced most LLM failures have nothing to do with the model itself. They appear where the data is fuzzy, the prompts drift, or no one bothered to measure real-world behavior. Samuel joins me to show how a sprinkle of engineering discipline keeps those failures from ever reaching users.

  • Poster Image

    Episode 52: Why Most LLM Products Break at Retrieval (And How to Fix Them)

    July 3rd, 2025  |  Season 1  |  28 mins 38 secs
    ai, data science, genai, llms, machine learning

    Most LLM-powered features do not break at the model. They break at the context. So how do you retrieve the right information to get useful results, even under vague or messy user queries?

    In this episode, we hear from Eric Ma, who leads data science research in the Data Science and AI group at Moderna. He shares what it takes to move beyond toy demos and ship LLM features that actually help people do their jobs.

  • Poster Image

    Episode 51: Why We Built an MCP Server and What Broke First

    June 27th, 2025  |  Season 1  |  47 mins 41 secs
    ai, data science, llms, machine learning

    What does it take to actually ship LLM-powered features, and what breaks when you connect them to real production data?

    In this episode, we hear from Philip Carter — then a Principal PM at Honeycomb and now a Product Management Director at Salesforce. In early 2023, he helped build one of the first LLM-powered SaaS features to ship to real users. More recently, he and his team built a production-ready MCP server.

  • Poster Image

    Episode 50: A Field Guide to Rapidly Improving AI Products -- With Hamel Husain

    June 17th, 2025  |  Season 1  |  27 mins 42 secs
    ai, data science, evals, llms, machine learning

    Hugo talks with Hamel Hussain (ex-Airbnb, GitHub, DataRobot) about how to improve AI products through evaluation, error analysis, and iteration. They discuss why most teams overlook debugging LLM systems, how to prioritize what to fix, and why evals are not just metrics—but a full development process.

  • Poster Image

    Episode 49: Why Data and AI Still Break at Scale (and What to Do About It)

    June 5th, 2025  |  Season 1  |  1 hr 21 mins
    ai, data science, llms, machine learning

    Hugo talks with Akshay Agrawal (Marimo, ex-Google Brain, Netflix, Stanford) about why data and AI systems still break at scale—and what it takes to fix them. They dive into the limits of existing workflows, the importance of reproducibility and reactive execution, and how Marimo reimagines notebooks for modern software development.

  • Poster Image

    Episode 48: HOW TO BENCHMARK AGI WITH GREG KAMRADT

    May 23rd, 2025  |  Season 1  |  1 hr 4 mins
    agi, ai, data science, machine learning

    Hugo talks with Greg Kamradt, President of the ARC Prize Foundation, about ARC-AGI: a benchmark built on Francois Chollet’s definition of intelligence as “the efficiency at which you learn new things.” Unlike most evals that focus on memorization or task completion, ARC is designed to measure generalization—and expose where today’s top models fall short.

  • Poster Image

    Episode 47: The Great Pacific Garbage Patch of Code Slop with Joe Reis

    April 7th, 2025  |  Season 1  |  1 hr 19 mins
    ai, data science, genai, llms, machine learning, vibe coding

    What if the cost of writing code dropped to zero — but the cost of understanding it skyrocketed?

    In this episode, Hugo sits down with Joe Reis to unpack how AI tooling is reshaping the software development lifecycle — from experimentation and prototyping to deployment, maintainability, and everything in between.

  • Poster Image

    Episode 46: Software Composition Is the New Vibe Coding

    April 3rd, 2025  |  Season 1  |  1 hr 8 mins
    ai, data science, genai, llms, machine learning, vibe coding

    What if building software felt more like composing than coding?

    In this episode, Hugo and Greg explore how LLMs are reshaping the way we think about software development—from deterministic programming to a more flexible, prompt-driven, and collaborative style of building. It’s not just hype or grift—it’s a real shift in how we express intent, reason about systems, and collaborate across roles.

    Hugo speaks with Greg Ceccarelli—co-founder of SpecStory, former CPO at Pluralsight, and Director of Data Science at GitHub—about the rise of software composition and how it changes the way individuals and teams create with LLMs.