Neel Nanda – Mechanistic Interpretability: A Whirlwind Tour

FAR․AI

Neel Nanda from DeepMind presenting 'Mechanistic Interpretability: A Whirlwind Tour' on July 21, 2024 at the Vienna Alignment Workshop. Key Highlights: Grasping AI cognition for alignment Reverse engineering neural networks Safeguarding against deceptive AI systems The Alignment Workshop is a ser