Jacob Pfau

contact: [first].pfau@gmail.com

PhD student at the NYU Alignment Research Group. Current research projects include:

studying scaling properties of LM performance as a function of filler (i.e. repeated) tokens in prompts
latent adversarial training for improving safety of LMs

news

Apr 26, 2024	Posted Let’s Think Dot By Dot: Hidden Computation in Transformer Language Models to arXiv.
Feb 20, 2024	Posted Auditing LMs with counterfactual search: a tool for control and ELK to LessWrong.
Apr 26, 2023	Posted LM Situational Awareness, Evaluation Proposal: Violating Imitation to LessWrong.