Ship Real Agents: Hands-On Evals for Agentic Applications — Laurie Voss, Arize
AI EngineerMost agents get tested by running a few queries and checking if it looks right. Laurie calls this the vibes problem: it doesn't catch regressions, doesn't run in CI, and doesn't tell you whether a prompt fix broke three other things. This workshop builds a complete eval pipeline from scratch on a fi