PhD student at the NYU Alignment Research Group. Current research projects include:
- applying causal behavioral analysis to problems of introspective truthfulness in LMs
- studying scaling properties of LM performance as a function of empty space in prompts
- formalizing limitations of RL fine-tuning of LMs
I like to post about research on Twitter and Lesswrong. I also like to create prediction markets e.g. “Will an AI produce encyclopedia-worthy philosophy by 2026” on Manifold, and “Will transformer derived architectures accelerate progress in deep learning?” on Metaculus.
|Apr 26, 2023||Posted LM Situational Awareness, Evaluation Proposal: Violating Imitation to LessWrong.|