Too many teams are building AI applications without truly understanding why their models fail. Instead of jumping straight to LLM evaluations, dashboards, or vibe checks, how do you actually fix a broken AI app?
In this episode, Hugo speaks with Hamel Husain, longtime ML engineer, open-source contributor, and consultant, about why debugging generative AI systems starts with looking at your data.
In this episode, we dive into:
- Why “look at your data” is the best debugging advice no one follows.
- How spreadsheet-based error analysis can uncover failure modes faster than complex dashboards.
- The role of synthetic data in bootstrapping evaluation.
- When to trust LLM judges—and when they’re misleading.
- Why most AI dashboards measuring truthfulness, helpfulness, and conciseness are often a waste of time.
If you're building AI-powered applications, this episode will change how you approach debugging, iteration, and improving model performance in production.
LINKS