news
Apr 26, 2024 | Posted Let’s Think Dot By Dot: Hidden Computation in Transformer Language Models to arXiv. |
---|---|
Feb 20, 2024 | Posted Auditing LMs with counterfactual search: a tool for control and ELK to LessWrong. |
Apr 26, 2023 | Posted LM Situational Awareness, Evaluation Proposal: Violating Imitation to LessWrong. |