I am a postdoctoral researcher in Dirk Hovy‘s MilaNLP Lab. My work is located at the intersection of computation, language and society. I am particularly interested in evaluating and improving the safety of large language models.
In May 2023, I completed my PhD at the University of Oxford, where I was supervised by Janet Pierrehumbert and Helen Margetts. In my PhD, I worked on improving the evaluation and effectiveness of large language models for hate speech detection. The HateCheck project that I led won the Stanford AI Audit Challenge for “Best Holistic Evaluation and Benchmarking”. I was also a part of OpenAI’s red team for GPT-4, testing the model’s safety before its public release.
During my PhD, I also co-founded Rewire, a start-up building socially responsible AI for online safety. Over two years as CTO, I led a technical team of 10+ people, working on large projects for Google, Meta and others. In March 2023, we sold Rewire to ActiveFence.