LLM FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 35 Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs Paper • 2311.02262 • Published Nov 3, 2023 • 10
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 35
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs Paper • 2311.02262 • Published Nov 3, 2023 • 10