Jacob Pfau
contact: [first].pfau@gmail.com
PhD student at the NYU Alignment Research Group. Current research projects include:
- studying scaling properties of LM performance as a function of filler (i.e. repeated) tokens in prompts
- latent adversarial training for improving safety of LMs
I like to post about research on Twitter and Lesswrong. I also like to create prediction markets e.g. “Will an AI produce encyclopedia-worthy philosophy by 2026” on Manifold, and “Will transformer derived architectures accelerate progress in deep learning?” on Metaculus.
news
Apr 26, 2024 | Posted Let’s Think Dot By Dot: Hidden Computation in Transformer Language Models to arXiv. |
---|---|
Feb 20, 2024 | Posted Auditing LMs with counterfactual search: a tool for control and ELK to LessWrong. |
Apr 26, 2023 | Posted LM Situational Awareness, Evaluation Proposal: Violating Imitation to LessWrong. |