Hi, I'm Henry! 👋
I participate in and manage AI safety research programs. My work focuses on accelerating AI safety research collaborations. I have contributed to several important advances in the field of AI safety research including adversarial robustness, jailbreaking, and AI Self Understanding.
Previously, I led MATS London's 30-person AI safety research program, where our teams produced multiple significant research outputs including an ICML Best Paper on debating with LLMs. As an AI safety consultant, I’m also currently helping design Anthropic's Fellows Program, which I’ll also co-supervise research projects for.
Before focusing on AI safety, I co-founded the Global Challenges Project, and studied Classics (Philosophy) at Oxford. My hope is to ensure AI systems are beneficial to all current and future sentiences.
Get in touch
Email / Google Scholar / Twitter / Linkedin
Some recent papers:
I’ve (people & project) managed include: work on language model jailbreaking, rapid response techniques for addressing urgent safety vulnerabilities, and methods for improving / measuring LLM self-knowledge through introspection.
See more on my Google scholar.
Research Discussions:
If you’d like more context on the styles of research project I usually work on, here's a conversation with Ethan Perez (Anthropic) and Mikita Balesni (Apollo Research) about developing AI safety research project ideas from scratch.
ResearchContact