Vanishing Gradients - Episodes Tagged with “Genai”

Episode 63: Why Gemini 3 Will Change How You Build AI Agents with Ravin Kumar (Google DeepMind)

Hugo Bowne-Anderson — Sat, 22 Nov 2025 18:30:00 +1100

Gemini 3 is a few days old and the massive leap in performance and model reasoning has big implications for builders: as models begin to self-heal, builders are literally tearing out the functionality they built just months ago... ripping out the defensive coding and reshipping their agent harnesses entirely.

Ravin Kumar (Google DeepMind) joins Hugo to breaks down exactly why the rapid evolution of models like Gemini 3 is changing how we build software. They detail the shift from simple tool calling to building reliable "Agent Harnesses", explore the architectural tradeoffs between deterministic workflows and high-agency systems, the nuance of preventing context rot in massive windows, and why proper evaluation infrastructure is the only way to manage the chaos of autonomous loops.

They talk through:

The implications of models that can "self-heal" and fix their own code
The two cultures of agents: LLM workflows with a few tools versus when you should unleash high-agency, autonomous systems.
Inside NotebookLM: moving from prototypes to viral production features like Audio Overviews
Why Needle in a Haystack benchmarks often fail to predict real-world performance
How to build agent harnesses that turn model capabilities into product velocity
The shift from measuring latency to managing time-to-compute for reasoning tasks

LINKS

Join the final cohort of our Building AI Applications course starting March 10, 2026 (25% off for listeners): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgfs

Episode 60: 10 Things I Hate About AI Evals with Hamel Husain

Hugo Bowne-Anderson — Tue, 30 Sep 2025 17:30:00 +1000

Most AI teams find "evals" frustrating, but ML Engineer Hamel Husain argues they’re just using the wrong playbook. In this episode, he lays out a data-centric approach to systematically measure and improve AI, turning unreliable prototypes into robust, production-ready systems.

Drawing from his experience getting countless teams unstuck, Hamel explains why the solution requires a "revenge of the data scientists." He details the essential mindset shifts, error analysis techniques, and practical steps needed to move beyond guesswork and build AI products you can actually trust.

We talk through:

The 10(+1) critical mistakes that cause teams to waste time on evals
Why "hallucination scores" are a waste of time (and what to measure instead)
The manual review process that finds major issues in hours, not weeks
A step-by-step method for building LLM judges you can actually trust
How to use domain experts without getting stuck in endless review committees
Guest Bryan Bischof's "Failure as a Funnel" for debugging complex AI agents

If you're tired of ambiguous "vibe checks" and want a clear process that delivers real improvement, this episode provides the definitive roadmap.

LINKS

🎓 Learn more:

Hugo's course: Building LLM Applications for Data Scientists and Software Engineers — https://maven.com/s/course/d56067f338

Episode 56: DeepMind Just Dropped Gemma 270M... And Here’s Why It Matters

Hugo Bowne-Anderson — Fri, 15 Aug 2025 02:00:00 +1000

While much of the AI world chases ever-larger models, Ravin Kumar (Google DeepMind) and his team build across the size spectrum, from billions of parameters down to this week’s release: Gemma 270M, the smallest member yet of the Gemma 3 open-weight family. At just 270 million parameters, a quarter the size of Gemma 1B, it’s designed for speed, efficiency, and fine-tuning.

We explore what makes 270M special, where it fits alongside its billion-parameter siblings, and why you might reach for it in production even if you think “small” means “just for experiments.”

We talk through:

Where 270M fits into the Gemma 3 lineup — and why it exists
On-device use cases where latency, privacy, and efficiency matter
How smaller models open up rapid, targeted fine-tuning
Running multiple models in parallel without heavyweight hardware
Why “small” models might drive the next big wave of AI adoption

If you’ve ever wondered what you’d do with a model this size (or how to squeeze the most out of it) this episode will show you how small can punch far above its weight.

LINKS

🎓 Learn more:

Hugo's course: Building LLM Applications for Data Scientists and Software Engineers — https://maven.com/s/course/d56067f338 ($600 off early bird discount for November cohort availiable until August 16)

Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference

Hugo Bowne-Anderson — Sat, 19 Jul 2025 02:00:00 +1000

Colab is cozy. But production won’t fit on a single GPU.
Zach Mueller leads Accelerate at Hugging Face and spends his days helping people go from solo scripts to scalable systems. In this episode, he joins me to demystify distributed training and inference — not just for research labs, but for any ML engineer trying to ship real software.

We talk through:
• From Colab to clusters: why scaling isn’t just about training massive models, but serving agents, handling load, and speeding up iteration
• Zero-to-two GPUs: how to get started without Kubernetes, Slurm, or a PhD in networking
• Scaling tradeoffs: when to care about interconnects, which infra bottlenecks actually matter, and how to avoid chasing performance ghosts
• The GPU middle class: strategies for training and serving on a shoestring, with just a few cards or modest credits
• Local experiments, global impact: why learning distributed systems—even just a little—can set you apart as an engineer

If you’ve ever stared at a Hugging Face training script and wondered how to run it on something more than your laptop: this one’s for you.

LINKS

🎓 Learn more:

Hugo's course: Building LLM Applications for Data Scientists and Software Engineers — https://maven.com/s/course/d56067f338
Zach's course (45% off for VG listeners!): Scratch to Scale: Large-Scale Training in the Modern World -- https://maven.com/walk-with-code/scratch-to-scale?promoCode=hugo39

📺 Watch the video version on YouTube: YouTube link

Episode 52: Why Most LLM Products Break at Retrieval (And How to Fix Them)

Hugo Bowne-Anderson — Thu, 03 Jul 2025 02:00:00 +1000

Most LLM-powered features do not break at the model. They break at the context. So how do you retrieve the right information to get useful results, even under vague or messy user queries?

In this episode, we hear from Eric Ma, who leads data science research in the Data Science and AI group at Moderna. He shares what it takes to move beyond toy demos and ship LLM features that actually help people do their jobs.

We cover:
• How to align retrieval with user intent and why cosine similarity is not the answer
• How a dumb YAML-based system outperformed so-called smart retrieval pipelines
• Why vague queries like “what is this all about” expose real weaknesses in most systems
• When vibe checks are enough and when formal evaluation is worth the effort
• How retrieval workflows can evolve alongside your product and user needs

If you are building LLM-powered systems and care about how they work, not just whether they work, this one is for you.

LINKS

🎓 Learn more:

Hugo's course: Building LLM Applications for Data Scientists and Software Engineers — next cohort starts July 8: https://maven.com/s/course/d56067f338

📺 Watch the video version on YouTube: YouTube link

Episode 47: The Great Pacific Garbage Patch of Code Slop with Joe Reis

Hugo Bowne-Anderson — Mon, 07 Apr 2025 10:00:00 +1000

What if the cost of writing code dropped to zero — but the cost of understanding it skyrocketed?

In this episode, Hugo sits down with Joe Reis to unpack how AI tooling is reshaping the software development lifecycle — from experimentation and prototyping to deployment, maintainability, and everything in between.

Joe is the co-author of Fundamentals of Data Engineering and a longtime voice on the systems side of modern software. He’s also one of the sharpest critics of “vibe coding” — the emerging pattern of writing software by feel, with heavy reliance on LLMs and little regard for structure or quality.

We dive into:
• Why “vibe coding” is more than a meme — and what it says about how we build today
• How AI tools expand the surface area of software creation — for better and worse
• What happens to technical debt, testing, and security when generation outpaces understanding
• The changing definition of “production” in a world of ephemeral, internal, or just-good-enough tools
• How AI is flattening the learning curve — and threatening the talent pipeline
• Joe’s view on what real craftsmanship means in an age of disposable code

This conversation isn’t about doom, and it’s not about hype. It’s about mapping the real, messy terrain of what it means to build software today — and how to do it with care.

LINKS

🎓 Want to go deeper?
Check out my course: Building LLM Applications for Data Scientists and Software Engineers.
Learn how to design, test, and deploy production-grade LLM systems — with observability, feedback loops, and structure built in.
This isn’t about vibes or fragile agents. It’s about making LLMs reliable, testable, and actually useful.

Includes over $800 in compute credits and guest lectures from experts at DeepMind, Moderna, and more.
Cohort starts July 8 — Use this link for a 10% discount

Episode 46: Software Composition Is the New Vibe Coding

Hugo Bowne-Anderson — Thu, 03 Apr 2025 13:00:00 +1100

What if building software felt more like composing than coding?

In this episode, Hugo and Greg explore how LLMs are reshaping the way we think about software development—from deterministic programming to a more flexible, prompt-driven, and collaborative style of building. It’s not just hype or grift—it’s a real shift in how we express intent, reason about systems, and collaborate across roles.

Hugo speaks with Greg Ceccarelli—co-founder of SpecStory, former CPO at Pluralsight, and Director of Data Science at GitHub—about the rise of software composition and how it changes the way individuals and teams create with LLMs.

We dive into:

Why software composition is emerging as a serious alternative to traditional coding
The real difference between vibe coding and production-minded prototyping
How LLMs are expanding who gets to build software—and how
What changes when you focus on intent, not just code
What Greg is building with SpecStory to support collaborative, traceable AI-native workflows
The challenges (and joys) of debugging and exploring with agentic tools like Cursor and Claude

We’ve removed the visual demos from the audio—but you can catch our live-coded Chrome extension and JFK document explorer on YouTube. Links below.

Includes over $2,500 in compute credits and guest lectures from experts at DeepMind, Moderna, and more.
Cohort starts April 7 — Use this link for a 10% discount

🔍 Want to help shape the future of SpecStory?

Greg and the team are looking for design partners for their new SpecStory Teams product—built for collaborative, AI-native software development.

If you're working with LLMs in a team setting and want to influence the next wave of developer tools, you can apply here:

👉 specstory.com/teams

Episode 42: Learning, Teaching, and Building in the Age of AI

Hugo Bowne-Anderson — Sat, 04 Jan 2025 14:00:00 +1100

In this episode of Vanishing Gradients, the tables turn as Hugo sits down with Alex Andorra, host of Learning Bayesian Statistics. Hugo shares his journey from mathematics to AI, reflecting on how Bayesian inference shapes his approach to data science, teaching, and building AI-powered applications.

They dive into the realities of deploying LLM applications, overcoming “proof-of-concept purgatory,” and why first principles and iteration are critical for success in AI. Whether you’re an educator, software engineer, or data scientist, this episode offers valuable insights into the intersection of AI, product development, and real-world deployment.

LINKS

Episode 41: Beyond Prompt Engineering: Can AI Learn to Set Its Own Goals?

Hugo Bowne-Anderson — Tue, 31 Dec 2024 10:00:00 +1100

Hugo Bowne-Anderson hosts a panel discussion from the MLOps World and Generative AI Summit in Austin, exploring the long-term growth of AI by distinguishing real problem-solving from trend-based solutions. If you're navigating the evolving landscape of generative AI, productionizing models, or questioning the hype, this episode dives into the tough questions shaping the field.

The panel features:

Ben Taylor (Jepson) – CEO and Founder at VEOX Inc., with experience in AI exploration, genetic programming, and deep learning.
Joe Reis – Co-founder of Ternary Data and author of Fundamentals of Data Engineering.
Juan Sequeda – Principal Scientist and Head of AI Lab at Data.World, known for his expertise in knowledge graphs and the semantic web.

The discussion unpacks essential topics such as:

The shift from prompt engineering to goal engineering—letting AI iterate toward well-defined objectives.
Whether generative AI is having an electricity moment or more of a blockchain trajectory.
The combinatorial power of AI to explore new solutions, drawing parallels to AlphaZero redefining strategy games.
The POC-to-production gap and why AI projects stall.
Failure modes, hallucinations, and governance risks—and how to mitigate them.
The disconnect between executive optimism and employee workload.

Hugo also mentions his upcoming workshop on escaping Proof-of-Concept Purgatory, which has evolved into a Maven course "Building LLM Applications for Data Scientists and Software Engineers" launching in January. Vanishing Gradient listeners can get 25% off the course (use the code VG25), with $1,000 in Modal compute credits included.

A huge thanks to Dave Scharbach and the Toronto Machine Learning Society for organizing the conference and to the audience for their thoughtful questions.

As we head into the new year, this conversation offers a reality check amidst the growing AI agent hype.

LINKS

Episode 38: The Art of Freelance AI Consulting and Products: Data, Dollars, and Deliverables

Hugo Bowne-Anderson — Tue, 05 Nov 2024 10:00:00 +1100

Hugo speaks with Jason Liu, an independent AI consultant with experience at Meta and Stitch Fix. At Stitch Fix, Jason developed impactful AI systems, like a $50 million product similarity search and the widely adopted Flight recommendation framework. Now, he helps startups and enterprises design and deploy production-level AI applications, with a focus on retrieval-augmented generation (RAG) and scalable solutions.

This episode is a bit of an experiment. Instead of our usual technical deep dives, we’re focusing on the world of AI consulting and freelancing. We explore Jason’s consulting playbook, covering how he structures contracts to maximize value, strategies for moving from hourly billing to securing larger deals, and the mindset shift needed to align incentives with clients. We’ll also discuss the challenges of moving from deterministic software to probabilistic AI systems and even do a live role-playing session where Jason coaches me on client engagement and pricing pitfalls.

LINKS

Episode 37: Prompt Engineering, Security in Generative AI, and the Future of AI Research Part 2

Hugo Bowne-Anderson — Tue, 08 Oct 2024 17:00:00 +1100

Hugo speaks with three leading figures from the world of AI research: Sander Schulhoff, a recent University of Maryland graduate and lead contributor to the Learn Prompting initiative; Philip Resnik, professor at the University of Maryland, known for his pioneering work in computational linguistics; and Dennis Peskoff, a researcher from Princeton specializing in prompt engineering and its applications in the social sciences.

This is Part 2 of a special two-part episode, prompted—no pun intended—by these guys being part of a team, led by Sander, that wrote a 76-page survey analyzing prompting techniques, agents, and generative AI. The survey included contributors from OpenAI, Microsoft, the University of Maryland, Princeton, and more.

In this episode, we cover:

The Prompt Report: A comprehensive survey on prompting techniques, agents, and generative AI, including advanced evaluation methods for assessing these techniques.
Security Risks and Prompt Hacking: A detailed exploration of the security concerns surrounding prompt engineering, including Sander’s thoughts on its potential applications in cybersecurity and military contexts.
AI’s Impact Across Fields: A discussion on how generative AI is reshaping various domains, including the social sciences and security.
Multimodal AI: Updates on how large language models (LLMs) are expanding to interact with images, code, and music.
Case Study - Detecting Suicide Risk: A careful examination of how prompting techniques are being used in important areas like detecting suicide risk, showcasing the critical potential of AI in addressing sensitive, real-world challenges.

The episode concludes with a reflection on the evolving landscape of LLMs and multimodal AI, and what might be on the horizon.

If you haven’t yet, make sure to check out Part 1, where we discuss the history of NLP, prompt engineering techniques, and Sander’s development of the Learn Prompting initiative.

LINKS

Episode 36: Prompt Engineering, Security in Generative AI, and the Future of AI Research Part 1

Hugo Bowne-Anderson — Mon, 30 Sep 2024 18:00:00 +1000

This is Part 1 of a special two-part episode, prompted—no pun intended—by these guys being part of a team, led by Sander, that wrote a 76-page survey analyzing prompting techniques, agents, and generative AI. The survey included contributors from OpenAI, Microsoft, the University of Maryland, Princeton, and more.

In this first part,

we’ll explore the critical role of prompt engineering,
& diving into adversarial techniques like prompt hacking and
the challenges of evaluating these techniques.
we’ll examine the impact of few-shot learning and
the groundbreaking taxonomy of prompting techniques from the Prompt Report.

Along the way,

we’ll uncover the rich history of natural language processing (NLP) and AI, showing how modern prompting techniques evolved from early rule-based systems and statistical methods.
we’ll also hear how Sander’s experimentation with GPT-3 for diplomatic tasks led him to develop Learn Prompting, and
how Dennis highlights the accessibility of AI through prompting, which allows non-technical users to interact with AI without needing to code.

Finally, we’ll explore the future of multimodal AI, where LLMs interact with images, code, and even music creation. Make sure to tune in to Part 2, where we dive deeper into security risks, prompt hacking, and more.

LINKS

Episode 35: Open Science at NASA -- Measuring Impact and the Future of AI

Hugo Bowne-Anderson — Thu, 19 Sep 2024 17:00:00 +1000

Hugo speaks with Dr. Chelle Gentemann, Open Science Program Scientist for NASA’s Office of the Chief Science Data Officer, about NASA’s ambitious efforts to integrate AI across the research lifecycle. In this episode, we’ll dive deeper into how AI is transforming NASA’s approach to science, making data more accessible and advancing open science practices. We explore

Measuring the Impact of Open Science: How NASA is developing new metrics to evaluate the effectiveness of open science, moving beyond traditional publication-based assessments.
The Process of Scientific Discovery: Insights into the collaborative nature of research and how breakthroughs are achieved at NASA.
** AI Applications in NASA’s Science:** From rats in space to exploring the origins of the universe, we cover how AI is being applied across NASA’s divisions to improve data accessibility and analysis.
Addressing Challenges in Open Science: The complexities of implementing open science within government agencies and research environments.
Reforming Incentive Systems: How NASA is reconsidering traditional metrics like publications and citations, and starting to recognize contributions such as software development and data sharing.
The Future of Open Science: How open science is shaping the future of research, fostering interdisciplinary collaboration, and increasing accessibility.

This conversation offers valuable insights for researchers, data scientists, and those interested in the practical applications of AI and open science. Join us as we discuss how NASA is working to make science more collaborative, reproducible, and impactful.

LINKS

Episode 34: The AI Revolution Will Not Be Monopolized

Hugo Bowne-Anderson — Thu, 22 Aug 2024 17:00:00 +1000

Hugo speaks with Ines Montani and Matthew Honnibal, the creators of spaCy and founders of Explosion AI. Collectively, they've had a huge impact on the fields of industrial natural language processing (NLP), ML, and AI through their widely-used open-source library spaCy and their innovative annotation tool Prodigy. These tools have become essential for many data scientists and NLP practitioners in industry and academia alike.

In this wide-ranging discussion, we dive into:

• The evolution of applied NLP and its role in industry
• The balance between large language models and smaller, specialized models
• Human-in-the-loop distillation for creating faster, more data-private AI systems
• The challenges and opportunities in NLP, including modularity, transparency, and privacy
• The future of AI and software development
• The potential impact of AI regulation on innovation and competition

We also touch on their recent transition back to a smaller, more independent-minded company structure and the lessons learned from their journey in the AI startup world.

Ines and Matt offer invaluable insights for data scientists, machine learning practitioners, and anyone interested in the practical applications of AI. They share their thoughts on how to approach NLP projects, the importance of data quality, and the role of open-source in advancing the field.

Whether you're a seasoned NLP practitioner or just getting started with AI, this episode offers a wealth of knowledge from two of the field's most respected figures. Join us for a discussion that explores the current landscape of AI development, with insights that bridge the gap between cutting-edge research and real-world applications.

LINKS

Check out and subcribe to our lu.ma calendar for upcoming livestreams!

Episode 32: Building Reliable and Robust ML/AI Pipelines

Hugo Bowne-Anderson — Sat, 27 Jul 2024 13:00:00 +1000

Hugo speaks with Shreya Shankar, a researcher at UC Berkeley focusing on data management systems with a human-centered approach. Shreya's work is at the cutting edge of human-computer interaction (HCI) and AI, particularly in the realm of large language models (LLMs). Her impressive background includes being the first ML engineer at Viaduct, doing research engineering at Google Brain, and software engineering at Facebook.

In this episode, we dive deep into the world of LLMs and the critical challenges of building reliable AI pipelines. We'll explore:

The fascinating journey from classic machine learning to the current LLM revolution
Why Shreya believes most ML problems are actually data management issues
The concept of "data flywheels" for LLM applications and how to implement them
The intriguing world of evaluating AI systems - who validates the validators?
Shreya's work on SPADE and EvalGen, innovative tools for synthesizing data quality assertions and aligning LLM evaluations with human preferences
The importance of human-in-the-loop processes in AI development
The future of low-code and no-code tools in the AI landscape

We'll also touch on the potential pitfalls of over-relying on LLMs, the concept of "Habsburg AI," and how to avoid disappearing up our own proverbial arseholes in the world of recursive AI processes.

Whether you're a seasoned AI practitioner, a curious data scientist, or someone interested in the human side of AI development, this conversation offers valuable insights into building more robust, reliable, and human-centered AI systems.

LINKS

In the podcast, Hugo also mentioned that this was the 5th time he and Shreya chatted publicly. which is wild!

If you want to dive deep into Shreya's work and related topics through their chats, you can check them all out here:

Check out and subcribe to our lu.ma calendar for upcoming livestreams!

Episode 31: Rethinking Data Science, Machine Learning, and AI

Hugo Bowne-Anderson — Tue, 09 Jul 2024 19:00:00 +1000

Hugo speaks with Vincent Warmerdam, a senior data professional and machine learning engineer at :probabl, the exclusive brand operator of scikit-learn. Vincent is known for challenging common assumptions and exploring innovative approaches in data science and machine learning.

In this episode, they dive deep into rethinking established methods in data science, machine learning, and AI. We explore Vincent's principled approach to the field, including:

The critical importance of exposing yourself to real-world problems before applying ML solutions
Framing problems correctly and understanding the data generating process
The power of visualization and human intuition in data analysis
Questioning whether algorithms truly meet the actual problem at hand
The value of simple, interpretable models and when to consider more complex approaches
The importance of UI and user experience in data science tools
Strategies for preventing algorithmic failures by rethinking evaluation metrics and data quality
The potential and limitations of LLMs in the current data science landscape
The benefits of open-source collaboration and knowledge sharing in the community

Throughout the conversation, Vincent illustrates these principles with vivid, real-world examples from his extensive experience in the field. They also discuss Vincent's thoughts on the future of data science and his call to action for more knowledge sharing in the community through blogging and open dialogue.

LINKS

Check out and subcribe to our lu.ma calendar for upcoming livestreams!

Episode 28: Beyond Supervised Learning: The Rise of In-Context Learning with LLMs

Hugo Bowne-Anderson — Mon, 10 Jun 2024 08:00:00 +1000

Hugo speaks with Alan Nichol, co-founder and CTO of Rasa, where they build software to enable developers to create enterprise-grade conversational AI and chatbot systems across industries like telcos, healthcare, fintech, and government.

What's super cool is that Alan and the Rasa team have been doing this type of thing for over a decade, giving them a wealth of wisdom on how to effectively incorporate LLMs into chatbots - and how not to. For example, if you want a chatbot that takes specific and important actions like transferring money, do you want to fully entrust the conversation to one big LLM like ChatGPT, or secure what the LLMs can do inside key business logic?

In this episode, they also dive into the history of conversational AI and explore how the advent of LLMs is reshaping the field. Alan shares his perspective on how supervised learning has failed us in some ways and discusses what he sees as the most overrated and underrated aspects of LLMs.

Alan offers advice for those looking to work with LLMs and conversational AI, emphasizing the importance of not sleeping on proven techniques and looking beyond the latest hype. In a live demo, he showcases Rasa's Calm (Conversational AI with Language Models), which allows developers to define business logic declaratively and separate it from the LLM, enabling reliable execution of conversational flows.

LINKS

Upcoming Livestreams

Episode 27: How to Build Terrible AI Systems

Hugo Bowne-Anderson — Fri, 31 May 2024 10:00:00 +1000

Hugo speaks with Jason Liu, an independent consultant who uses his expertise in recommendation systems to help fast-growing startups build out their RAG applications. He was previously at Meta and Stitch Fix is also the creator of Instructor, Flight, and an ML and data science educator.

They talk about how Jason approaches consulting companies across many industries, including construction and sales, in building production LLM apps, his playbook for getting ML and AI up and running to build and maintain such apps, and the future of tooling to do so.

They take an inverted thinking approach, envisaging all the failure modes that would result in building terrible AI systems, and then figure out how to avoid such pitfalls.

LINKS

Upcoming Livestreams

Episode 26: Developing and Training LLMs From Scratch

Hugo Bowne-Anderson — Wed, 15 May 2024 13:00:00 +1000

Hugo speaks with Sebastian Raschka, a machine learning & AI researcher, programmer, and author. As Staff Research Engineer at Lightning AI, he focuses on the intersection of AI research, software development, and large language models (LLMs).

How do you build LLMs? How can you use them, both in prototype and production settings? What are the building blocks you need to know about?

In this episode, we’ll tell you everything you need to know about LLMs, but were too afraid to ask: from covering the entire LLM lifecycle, what type of skills you need to work with them, what type of resources and hardware, prompt engineering vs fine-tuning vs RAG, how to build an LLM from scratch, and much more.

The idea here is not that you’ll need to use an LLM you’ve built from scratch, but that we’ll learn a lot about LLMs and how to use them in the process.

Near the end we also did some live coding to fine-tune GPT-2 in order to create a spam classifier!

LINKS

Episode 25: Fully Reproducible ML & AI Workflows

Hugo Bowne-Anderson — Mon, 18 Mar 2024 23:00:00 +1100

Hugo speaks with Omoju Miller, a machine learning guru and founder and CEO of Fimio, where she is building 21st century dev tooling. In the past, she was Technical Advisor to the CEO at GitHub, spent time co-leading non-profit investment in Computer Science Education for Google, and served as a volunteer advisor to the Obama administration’s White House Presidential Innovation Fellows.

We need open tools, open data, provenance, and the ability to build fully reproducible, transparent machine learning workflows. With the advent of closed-source, vendor-based APIs and compute becoming a form of gate-keeping, developer tools are at the risk of becoming commoditized and developers becoming consumers.

We’ll talk about how ideas for escaping these burgeoning walled gardens. We’ll dive into

What fully reproducible ML workflows would look like, including git for the workflow build process,
The need for loosely coupled and composable tools that embrace a UNIX-like philosophy,
What a much more scientific toolchain would look like,
What a future open sources commons for Generative AI could look like,
What an open compute ecosystem could look like,
How to create LLMs and tooling so everyone can use them to build production-ready apps,

And much more!

LINKS

Episode 24: LLM and GenAI Accessibility

Hugo Bowne-Anderson — Tue, 27 Feb 2024 17:00:00 +1100

Hugo speaks with Johno Whitaker, a Data Scientist/AI Researcher doing R&D with answer.ai. His current focus is on generative AI, flitting between different modalities. He also likes teaching and making courses, having worked with both Hugging Face and fast.ai in these capacities.

Johno recently reminded Hugo how hard everything was 10 years ago: “Want to install TensorFlow? Good luck. Need data? Perhaps try ImageNet. But now you can use big models from Hugging Face with hi-res satellite data and do all of this in a Colab notebook. Or think ecology and vision models… or medicine and multimodal models!”

We talk about where we’ve come from regarding tooling and accessibility for foundation models, ML, and AI, where we are, and where we’re going. We’ll delve into

What the Generative AI mindset is, in terms of using atomic building blocks, and how it evolved from both the data science and ML mindsets;
How fast.ai democratized access to deep learning, what successes they had, and what was learned;
The moving parts now required to make GenAI and ML as accessible as possible;
The importance of focusing on UX and the application in the world of generative AI and foundation models;
The skillset and toolkit needed to be an LLM and AI guru;
What they’re up to at answer.ai to democratize LLMs and foundation models.

LINKS