Paul Röttger

I am a postdoctoral researcher in Dirk Hovy‘s MilaNLP Lab. My work is located at the intersection of computation, language and society. I am particularly interested in evaluating and improving the safety of large language models.

In May 2023, I completed my PhD at the University of Oxford, where I was supervised by Janet Pierrehumbert and Helen Margetts. In my PhD, I worked on improving the evaluation and effectiveness of large language models for hate speech detection. The HateCheck project that I led won the Stanford AI Audit Challenge for “Best Holistic Evaluation and Benchmarking”. I was also a part of OpenAI’s red team for GPT-4, testing the model’s safety before its public release.

During my PhD, I also co-founded Rewire, a start-up building socially responsible AI for online safety. Over two years as CTO, I led a technical team of 10+ people, working on large projects for Google, Meta and others. In March 2023, we sold Rewire to ActiveFence.

News

January 2024 – I published SafetyPrompts.com, a collection of open datasets for LLM safety

December 2023 – I am at EMNLP 2023 in Singapore to present co-authored work on human feedback in LLMs

November 2023 – I am in Munich to visit the labs of Barbara Plank and Hinrich Schütze, and give two talks on LLM safety

September 2023 – I am at a workshop in Oxford to give a talk about using LLMs to simulate human samples

July 2023 – The SemEval Task we organised won Best Paper at SemEval 2023

July 2023 – I am at ACL 2023 in Toronto to present co-authored work and organise WOAH

June 2023 – I joined Dirk Hovy’s MilaNLP Lab as a postdoctoral researcher

May 2023 – I defended my PhD thesis in Oxford, assessed by Scott Hale and Maarten Sap

May 2023 – The HateCheck project that I led won the Stanford AI Audit Challenge

March 2023 – My work on OpenAI‘s red team for GPT-4 was covered by the FT, the Times, Sifted and others

…

Research

For a complete record of my publications, please visit Google Scholar or Semantic Scholar.

Mentorship

If you are a student interested in working together, please do reach out. I am always happy to chat or give advice. If you are a student at Bocconi, I am happy to discuss potential thesis supervision.

Press

I enjoy talking about my work, and I have been fortunate to have it featured across many different media outlets. My work on OpenAI’s red team for GPT-4, for example, was covered by the FT, the Times, Sifted and others. Rewire was spotlighted by Forbes and DCMS. The HateCheck project, which I led, was written up in MIT Tech Review, the Wall Street Journal and VentureBeat.

If you want to chat, please get in touch via email or on Twitter.