| Feb 19, 2026 |
Our team led a £25M alignment grants round, engaging leading researchers in CS theory, economics, and ML to develop new alignment research agendas.
|
| May 22, 2025 |
Posted Unexploitable search: blocking malicious use of free parameters to Lesswrong.
|
| May 19, 2025 |
Our safety case sketch for debate is up on arxiv. Examines the details of a hypothetical AGI deployment context and works out what we’d need to know about the training data, dynamics, and objective to build a safety case around debate.
|
| Apr 26, 2024 |
Posted
Let’s Think Dot By Dot: Hidden Computation in Transformer Language Models to arXiv.
|
| Feb 20, 2024 |
Posted
Auditing LMs with counterfactual search: a tool for control and ELK to LessWrong.
|
| Apr 26, 2023 |
Posted LM Situational Awareness, Evaluation Proposal: Violating Imitation to LessWrong.
|