LLM Circuit Analyses Are Consistent Across Training and Scale Paper • 2407.10827 • Published 12 days ago • 4 • 2
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs Paper • 2406.20086 • Published 28 days ago • 3 • 4
Multi-property Steering of Large Language Models with Dynamic Activation Composition Paper • 2406.17563 • Published Jun 25 • 4 • 1
Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation Paper • 2406.13663 • Published Jun 19 • 7 • 1