16 DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models · 7 authors 1
7 Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models · 8 authors 1