Interpretability: Understanding how AI models think

Anthropic

What's happening inside an AI model as it thinks? Why are AI models sycophantic, and why do they hallucinate? Are AI models just "glorified autocompletes", or is something more complicated going on? How do we even study these questions scientifically? Join Anthropic's Josh Batson, Emmanuel Ameisen,