<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" encoding="UTF-8" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:atom="http://www.w3.org/2005/Atom/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:fireside="http://fireside.fm/modules/rss/fireside">
  <channel>
    <fireside:hostname>web01.fireside.fm</fireside:hostname>
    <fireside:genDate>Sat, 11 Apr 2026 02:21:38 -0500</fireside:genDate>
    <generator>Fireside (https://fireside.fm)</generator>
    <title>Vanishing Gradients - Episodes Tagged with “Agents”</title>
    <link>https://vanishinggradients.fireside.fm/tags/agents</link>
    <pubDate>Fri, 31 Oct 2025 18:00:00 +1100</pubDate>
    <description>A podcast about all things data, brought to you by data scientist Hugo Bowne-Anderson.
It's time for more critical conversations about the challenges in our industry in order to build better compasses for the solution space! To this end, this podcast will consist of long-format conversations between Hugo and other people who work broadly in the data science, machine learning, and AI spaces. We'll dive deep into all the moving parts of the data world, so if you're new to the space, you'll have an opportunity to learn from the experts. And if you've been around for a while, you'll find out what's happening in many other parts of the data world.
</description>
    <language>en-us</language>
    <itunes:type>episodic</itunes:type>
    <itunes:subtitle>a data podcast with hugo bowne-anderson</itunes:subtitle>
    <itunes:author>Hugo Bowne-Anderson</itunes:author>
    <itunes:summary>A podcast about all things data, brought to you by data scientist Hugo Bowne-Anderson.
It's time for more critical conversations about the challenges in our industry in order to build better compasses for the solution space! To this end, this podcast will consist of long-format conversations between Hugo and other people who work broadly in the data science, machine learning, and AI spaces. We'll dive deep into all the moving parts of the data world, so if you're new to the space, you'll have an opportunity to learn from the experts. And if you've been around for a while, you'll find out what's happening in many other parts of the data world.
</itunes:summary>
    <itunes:image href="https://media24.fireside.fm/file/fireside-images-2024/podcasts/images/1/140c3904-8258-4c39-a698-a112b7077bd7/cover.jpg?v=1"/>
    <itunes:explicit>no</itunes:explicit>
    <itunes:keywords>data science, machine learning, AI</itunes:keywords>
    <itunes:owner>
      <itunes:name>Hugo Bowne-Anderson</itunes:name>
      <itunes:email>hugobowne@hey.com</itunes:email>
    </itunes:owner>
<itunes:category text="Technology"/>
<item>
  <title>Episode 62: Practical AI at Work: How Execs and Developers Can Actually Use LLMs </title>
  <link>https://vanishinggradients.fireside.fm/62</link>
  <guid isPermaLink="false">e1d21cdd-f714-4910-9696-60086f5feb62</guid>
  <pubDate>Fri, 31 Oct 2025 18:00:00 +1100</pubDate>
  <author>Hugo Bowne-Anderson</author>
  <enclosure url="https://aphid.fireside.fm/d/1437767933/140c3904-8258-4c39-a698-a112b7077bd7/e1d21cdd-f714-4910-9696-60086f5feb62.mp3" length="85069031" type="audio/mpeg"/>
  <itunes:episodeType>full</itunes:episodeType>
  <itunes:author>Hugo Bowne-Anderson</itunes:author>
  <itunes:subtitle>Many leaders are trapped between chasing ambitious, ill-defined AI projects and the paralysis of not knowing where to start. Dr. Randall Olson argues that the real opportunity isn't in moonshots, but in the "trillions of dollars of business value" available right now. As co-founder of Wyrd Studios, he bridges the gap between data science, AI engineering, and executive strategy to deliver a practical framework for execution.
</itunes:subtitle>
  <itunes:duration>59:04</itunes:duration>
  <itunes:explicit>no</itunes:explicit>
  <itunes:image href="https://media24.fireside.fm/file/fireside-images-2024/podcasts/images/1/140c3904-8258-4c39-a698-a112b7077bd7/cover.jpg?v=1"/>
  <description>Many leaders are trapped between chasing ambitious, ill-defined AI projects and the paralysis of not knowing where to start. Dr. Randall Olson argues that the real opportunity isn't in moonshots, but in the "trillions of dollars of business value" available right now. As co-founder of Wyrd Studios, he bridges the gap between data science, AI engineering, and executive strategy to deliver a practical framework for execution.
In this episode, Randy and Hugo lay out how to find and solve what might be considered "boring but valuable" problems, like an EdTech company automating 20% of its support tickets with a simple retrieval bot instead of a complex AI tutor. They discuss how to move incrementally along the "agentic spectrum" and why treating AI evaluation with the same rigor as software engineering is non-negotiable for building a disciplined, high-impact AI strategy.
They talk through:
How a non-technical leader can prototype a complex insurance claim classifier using just photos and a ChatGPT subscription.
The agentic spectrum: Why you should start by automating meeting summaries before attempting to build fully autonomous agents.
The practical first step for any executive: Building a personal knowledge base with meeting transcripts and strategy docs to get tailored AI advice.
Why treating AI evaluation with the same rigor as unit testing is essential for shipping reliable products.
The organizational shift required to unlock long-term AI gains, even if it means a short-term productivity dip.
LINKS
Randy on LinkedIn (https://www.zenml.io/llmops-database)
Wyrd Studios (https://thewyrdstudios.com/)
Stop Building AI Agents (https://www.decodingai.com/p/stop-building-ai-agents)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc)
🎓 Learn more:
Join the final cohort of our Building AI Applications course starting March 10, 2026 (25% off for listeners) (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs
Next cohort starts November 3: come build with us! 
</description>
  <itunes:keywords>ai, agents, machine learning, data science</itunes:keywords>
  <content:encoded>
    <![CDATA[<p>Many leaders are trapped between chasing ambitious, ill-defined AI projects and the paralysis of not knowing where to start. Dr. Randall Olson argues that the real opportunity isn&#39;t in moonshots, but in the &quot;trillions of dollars of business value&quot; available right now. As co-founder of Wyrd Studios, he bridges the gap between data science, AI engineering, and executive strategy to deliver a practical framework for execution.</p>

<p>In this episode, Randy and Hugo lay out how to find and solve what might be considered &quot;boring but valuable&quot; problems, like an EdTech company automating 20% of its support tickets with a simple retrieval bot instead of a complex AI tutor. They discuss how to move incrementally along the &quot;agentic spectrum&quot; and why treating AI evaluation with the same rigor as software engineering is non-negotiable for building a disciplined, high-impact AI strategy.</p>

<p>They talk through:</p>

<ul>
<li>How a non-technical leader can prototype a complex insurance claim classifier using just photos and a ChatGPT subscription.</li>
<li>The agentic spectrum: Why you should start by automating meeting summaries before attempting to build fully autonomous agents.</li>
<li>The practical first step for any executive: Building a personal knowledge base with meeting transcripts and strategy docs to get tailored AI advice.</li>
<li>Why treating AI evaluation with the same rigor as unit testing is essential for shipping reliable products.</li>
<li>The organizational shift required to unlock long-term AI gains, even if it means a short-term productivity dip.</li>
</ul>

<p><strong>LINKS</strong></p>

<ul>
<li><a href="https://www.zenml.io/llmops-database" rel="nofollow">Randy on LinkedIn</a></li>
<li><a href="https://thewyrdstudios.com/" rel="nofollow">Wyrd Studios</a></li>
<li><a href="https://www.decodingai.com/p/stop-building-ai-agents" rel="nofollow">Stop Building AI Agents</a></li>
<li><a href="https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk" rel="nofollow">Upcoming Events on Luma</a></li>
<li><a href="https://youtu.be/-YQjKH3wRvc" rel="nofollow">Watch the podcast video on YouTube</a></li>
</ul>

<p>🎓 Learn more:</p>

<p><a href="https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs" rel="nofollow">Join the final cohort of our Building AI Applications course starting March 10, 2026 (25% off for listeners)</a>: <a href="https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs" rel="nofollow">https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs</a></p>

<p>Next cohort starts November 3: come build with us!</p>]]>
  </content:encoded>
  <itunes:summary>
    <![CDATA[<p>Many leaders are trapped between chasing ambitious, ill-defined AI projects and the paralysis of not knowing where to start. Dr. Randall Olson argues that the real opportunity isn&#39;t in moonshots, but in the &quot;trillions of dollars of business value&quot; available right now. As co-founder of Wyrd Studios, he bridges the gap between data science, AI engineering, and executive strategy to deliver a practical framework for execution.</p>

<p>In this episode, Randy and Hugo lay out how to find and solve what might be considered &quot;boring but valuable&quot; problems, like an EdTech company automating 20% of its support tickets with a simple retrieval bot instead of a complex AI tutor. They discuss how to move incrementally along the &quot;agentic spectrum&quot; and why treating AI evaluation with the same rigor as software engineering is non-negotiable for building a disciplined, high-impact AI strategy.</p>

<p>They talk through:</p>

<ul>
<li>How a non-technical leader can prototype a complex insurance claim classifier using just photos and a ChatGPT subscription.</li>
<li>The agentic spectrum: Why you should start by automating meeting summaries before attempting to build fully autonomous agents.</li>
<li>The practical first step for any executive: Building a personal knowledge base with meeting transcripts and strategy docs to get tailored AI advice.</li>
<li>Why treating AI evaluation with the same rigor as unit testing is essential for shipping reliable products.</li>
<li>The organizational shift required to unlock long-term AI gains, even if it means a short-term productivity dip.</li>
</ul>

<p><strong>LINKS</strong></p>

<ul>
<li><a href="https://www.zenml.io/llmops-database" rel="nofollow">Randy on LinkedIn</a></li>
<li><a href="https://thewyrdstudios.com/" rel="nofollow">Wyrd Studios</a></li>
<li><a href="https://www.decodingai.com/p/stop-building-ai-agents" rel="nofollow">Stop Building AI Agents</a></li>
<li><a href="https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk" rel="nofollow">Upcoming Events on Luma</a></li>
<li><a href="https://youtu.be/-YQjKH3wRvc" rel="nofollow">Watch the podcast video on YouTube</a></li>
</ul>

<p>🎓 Learn more:</p>

<p><a href="https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs" rel="nofollow">Join the final cohort of our Building AI Applications course starting March 10, 2026 (25% off for listeners)</a>: <a href="https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs" rel="nofollow">https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs</a></p>

<p>Next cohort starts November 3: come build with us!</p>]]>
  </itunes:summary>
</item>
<item>
  <title>Episode 61: The AI Agent Reliability Cliff: What Happens When Tools Fail in Production</title>
  <link>https://vanishinggradients.fireside.fm/61</link>
  <guid isPermaLink="false">66d8da7e-5291-4273-8a87-c956fdf2f784</guid>
  <pubDate>Thu, 16 Oct 2025 14:00:00 +1100</pubDate>
  <author>Hugo Bowne-Anderson</author>
  <enclosure url="https://aphid.fireside.fm/d/1437767933/140c3904-8258-4c39-a698-a112b7077bd7/66d8da7e-5291-4273-8a87-c956fdf2f784.mp3" length="55333020" type="audio/mpeg"/>
  <itunes:episodeType>full</itunes:episodeType>
  <itunes:author>Hugo Bowne-Anderson</itunes:author>
  <itunes:subtitle>Most AI teams find their multi-agent systems devolving into chaos, but ML Engineer Alex Strick van Linschoten argues they are ignoring the production reality. In this episode, he draws on insights from the LLM Ops Database (750+ real-world deployments then; now nearly 1,000!) to systematically measure and engineer constraint, turning unreliable prototypes into robust, enterprise-ready AI.</itunes:subtitle>
  <itunes:duration>28:04</itunes:duration>
  <itunes:explicit>no</itunes:explicit>
  <itunes:image href="https://media24.fireside.fm/file/fireside-images-2024/podcasts/images/1/140c3904-8258-4c39-a698-a112b7077bd7/cover.jpg?v=1"/>
  <description>Most AI teams find their multi-agent systems devolving into chaos, but ML Engineer Alex Strick van Linschoten argues they are ignoring the production reality. In this episode, he draws on insights from the LLM Ops Database (750+ real-world deployments then; now nearly 1,000!) to systematically measure and engineer constraint, turning unreliable prototypes into robust, enterprise-ready AI.
Drawing from his work at Zen ML, Alex details why success requires scaling down and enforcing MLOps discipline to navigate the unpredictable "Agent Reliability Cliff". He provides the essential architectural shifts, evaluation hygiene techniques, and practical steps needed to move beyond guesswork and build scalable, trustworthy AI products.
We talk through:
- Why "shoving a thousand agents" into an app is the fastest route to unmanageable chaos
- The essential MLOps hygiene (tracing and continuous evals) that most teams skip
- The optimal (and very low) limit for the number of tools an agent can reliably use
- How to use human-in-the-loop strategies to manage the risk of autonomous failure in high-sensitivity domains
- The principle of using simple Python/RegEx before resorting to costly LLM judges
LINKS
The LLMOps Database: 925 entries as of today....submit a use case to help it get to 1K! (https://www.zenml.io/llmops-database)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc)
🎓 Learn more:
Join the final cohort of our Building AI Applications course starting March 10, 2026 (25% off for listeners) (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs 
</description>
  <itunes:keywords>ai, agents, mlops, machine learning</itunes:keywords>
  <content:encoded>
    <![CDATA[<p>Most AI teams find their multi-agent systems devolving into chaos, but ML Engineer Alex Strick van Linschoten argues they are ignoring the production reality. In this episode, he draws on insights from the LLM Ops Database (750+ real-world deployments then; now nearly 1,000!) to systematically measure and engineer constraint, turning unreliable prototypes into robust, enterprise-ready AI.</p>

<p>Drawing from his work at Zen ML, Alex details why success requires scaling down and enforcing MLOps discipline to navigate the unpredictable &quot;Agent Reliability Cliff&quot;. He provides the essential architectural shifts, evaluation hygiene techniques, and practical steps needed to move beyond guesswork and build scalable, trustworthy AI products.</p>

<p>We talk through:</p>

<ul>
<li>Why &quot;shoving a thousand agents&quot; into an app is the fastest route to unmanageable chaos</li>
<li>The essential MLOps hygiene (tracing and continuous evals) that most teams skip</li>
<li>The optimal (and very low) limit for the number of tools an agent can reliably use</li>
<li>How to use human-in-the-loop strategies to manage the risk of autonomous failure in high-sensitivity domains</li>
<li>The principle of using simple Python/RegEx before resorting to costly LLM judges</li>
</ul>

<p><strong>LINKS</strong></p>

<ul>
<li><a href="https://www.zenml.io/llmops-database" rel="nofollow">The LLMOps Database: 925 entries as of today....submit a use case to help it get to 1K!</a></li>
<li><a href="https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk" rel="nofollow">Upcoming Events on Luma</a></li>
<li><a href="https://youtu.be/-YQjKH3wRvc" rel="nofollow">Watch the podcast video on YouTube</a></li>
</ul>

<p>🎓 Learn more:</p>

<p><a href="https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs" rel="nofollow">Join the final cohort of our Building AI Applications course starting March 10, 2026 (25% off for listeners)</a>: <a href="https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs" rel="nofollow">https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs</a></p>]]>
  </content:encoded>
  <itunes:summary>
    <![CDATA[<p>Most AI teams find their multi-agent systems devolving into chaos, but ML Engineer Alex Strick van Linschoten argues they are ignoring the production reality. In this episode, he draws on insights from the LLM Ops Database (750+ real-world deployments then; now nearly 1,000!) to systematically measure and engineer constraint, turning unreliable prototypes into robust, enterprise-ready AI.</p>

<p>Drawing from his work at Zen ML, Alex details why success requires scaling down and enforcing MLOps discipline to navigate the unpredictable &quot;Agent Reliability Cliff&quot;. He provides the essential architectural shifts, evaluation hygiene techniques, and practical steps needed to move beyond guesswork and build scalable, trustworthy AI products.</p>

<p>We talk through:</p>

<ul>
<li>Why &quot;shoving a thousand agents&quot; into an app is the fastest route to unmanageable chaos</li>
<li>The essential MLOps hygiene (tracing and continuous evals) that most teams skip</li>
<li>The optimal (and very low) limit for the number of tools an agent can reliably use</li>
<li>How to use human-in-the-loop strategies to manage the risk of autonomous failure in high-sensitivity domains</li>
<li>The principle of using simple Python/RegEx before resorting to costly LLM judges</li>
</ul>

<p><strong>LINKS</strong></p>

<ul>
<li><a href="https://www.zenml.io/llmops-database" rel="nofollow">The LLMOps Database: 925 entries as of today....submit a use case to help it get to 1K!</a></li>
<li><a href="https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk" rel="nofollow">Upcoming Events on Luma</a></li>
<li><a href="https://youtu.be/-YQjKH3wRvc" rel="nofollow">Watch the podcast video on YouTube</a></li>
</ul>

<p>🎓 Learn more:</p>

<p><a href="https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs" rel="nofollow">Join the final cohort of our Building AI Applications course starting March 10, 2026 (25% off for listeners)</a>: <a href="https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs" rel="nofollow">https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs</a></p>]]>
  </itunes:summary>
</item>
<item>
  <title>Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)</title>
  <link>https://vanishinggradients.fireside.fm/57</link>
  <guid isPermaLink="false">60db26a1-cad5-4c3d-9661-bbc51a3a0b27</guid>
  <pubDate>Fri, 29 Aug 2025 21:00:00 +1000</pubDate>
  <author>Hugo Bowne-Anderson</author>
  <enclosure url="https://aphid.fireside.fm/d/1437767933/140c3904-8258-4c39-a698-a112b7077bd7/60db26a1-cad5-4c3d-9661-bbc51a3a0b27.mp3" length="81037068" type="audio/mpeg"/>
  <itunes:episodeType>full</itunes:episodeType>
  <itunes:author>Hugo Bowne-Anderson</itunes:author>
  <itunes:subtitle>While many people talk about “agents,” **Shreya Shankar** (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply.  

Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines</itunes:subtitle>
  <itunes:duration>41:27</itunes:duration>
  <itunes:explicit>no</itunes:explicit>
  <itunes:image href="https://media24.fireside.fm/file/fireside-images-2024/podcasts/images/1/140c3904-8258-4c39-a698-a112b7077bd7/cover.jpg?v=1"/>
  <description>While many people talk about “agents,” Shreya Shankar (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply.  
Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines.  
We talk through:  
- Treating LLM workflows as ETL pipelines for unstructured text  
- Error analysis: why you need humans reviewing the first 50–100 traces  
- Guardrails like retries, validators, and “gleaning”  
- How LLM judges work — rubrics, pairwise comparisons, and cost trade-offs  
- Cheap vs. expensive models: when to swap for savings  
- Where agents fit in (and where they don’t)  
If you’ve ever wondered how to move beyond unreliable demos, this episode shows how to scale LLMs to millions of documents — without breaking the bank.
LINKS
Shreya's website (https://www.sh-reya.com/)
DocETL, A system for LLM-powered data processing (https://www.docetl.org/)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
Watch the podcast video on YouTube (https://youtu.be/3r_Hsjy85nk)
Shreya's AI evals course, which she teaches with Hamel "Evals" Husain (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME)
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 
</description>
  <itunes:keywords>LLMs, Agents, RAG, Machine Learning</itunes:keywords>
  <content:encoded>
    <![CDATA[<p>While many people talk about “agents,” <strong>Shreya Shankar</strong> (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply.  </p>

<p>Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines.  </p>

<p><strong>We talk through:</strong>  </p>

<ul>
<li>Treating LLM workflows as ETL pipelines for unstructured text<br></li>
<li>Error analysis: why you need humans reviewing the first 50–100 traces<br></li>
<li>Guardrails like retries, validators, and “gleaning”<br></li>
<li>How LLM judges work — rubrics, pairwise comparisons, and cost trade-offs<br></li>
<li>Cheap vs. expensive models: when to swap for savings<br></li>
<li>Where agents fit in (and where they don’t)<br></li>
</ul>

<p>If you’ve ever wondered how to move beyond unreliable demos, this episode shows how to scale LLMs to millions of documents — without breaking the bank.</p>

<p><strong>LINKS</strong></p>

<ul>
<li><a href="https://www.sh-reya.com/" rel="nofollow">Shreya&#39;s website</a></li>
<li><a href="https://www.docetl.org/" rel="nofollow">DocETL, A system for LLM-powered data processing</a></li>
<li><a href="https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk" rel="nofollow">Upcoming Events on Luma</a></li>
<li><a href="https://youtu.be/3r_Hsjy85nk" rel="nofollow">Watch the podcast video on YouTube</a></li>
<li><a href="https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME" rel="nofollow">Shreya&#39;s AI evals course, which she teaches with Hamel &quot;Evals&quot; Husain</a></li>
</ul>

<p>🎓 Learn more:</p>

<ul>
<li><strong>Hugo&#39;s course:</strong> <a href="https://maven.com/s/course/d56067f338" rel="nofollow">Building LLM Applications for Data Scientists and Software Engineers</a> — <a href="https://maven.com/s/course/d56067f338" rel="nofollow">https://maven.com/s/course/d56067f338</a> </li>
</ul>]]>
  </content:encoded>
  <itunes:summary>
    <![CDATA[<p>While many people talk about “agents,” <strong>Shreya Shankar</strong> (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply.  </p>

<p>Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines.  </p>

<p><strong>We talk through:</strong>  </p>

<ul>
<li>Treating LLM workflows as ETL pipelines for unstructured text<br></li>
<li>Error analysis: why you need humans reviewing the first 50–100 traces<br></li>
<li>Guardrails like retries, validators, and “gleaning”<br></li>
<li>How LLM judges work — rubrics, pairwise comparisons, and cost trade-offs<br></li>
<li>Cheap vs. expensive models: when to swap for savings<br></li>
<li>Where agents fit in (and where they don’t)<br></li>
</ul>

<p>If you’ve ever wondered how to move beyond unreliable demos, this episode shows how to scale LLMs to millions of documents — without breaking the bank.</p>

<p><strong>LINKS</strong></p>

<ul>
<li><a href="https://www.sh-reya.com/" rel="nofollow">Shreya&#39;s website</a></li>
<li><a href="https://www.docetl.org/" rel="nofollow">DocETL, A system for LLM-powered data processing</a></li>
<li><a href="https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk" rel="nofollow">Upcoming Events on Luma</a></li>
<li><a href="https://youtu.be/3r_Hsjy85nk" rel="nofollow">Watch the podcast video on YouTube</a></li>
<li><a href="https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME" rel="nofollow">Shreya&#39;s AI evals course, which she teaches with Hamel &quot;Evals&quot; Husain</a></li>
</ul>

<p>🎓 Learn more:</p>

<ul>
<li><strong>Hugo&#39;s course:</strong> <a href="https://maven.com/s/course/d56067f338" rel="nofollow">Building LLM Applications for Data Scientists and Software Engineers</a> — <a href="https://maven.com/s/course/d56067f338" rel="nofollow">https://maven.com/s/course/d56067f338</a> </li>
</ul>]]>
  </itunes:summary>
</item>
  </channel>
</rss>
