When an LLM is apprehensive about its answers -- and when its uncertainty is justified Paper • 2503.01688 • Published 6 days ago • 19
LongRoPE2: Near-Lossless LLM Context Window Scaling Paper • 2502.20082 • Published 10 days ago • 31
How to Get Your LLM to Generate Challenging Problems for Evaluation Paper • 2502.14678 • Published 17 days ago • 16
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published 17 days ago • 160
The Ultimate Collection of Code Classifiers Collection 🔥 15 classifiers, 124M parameters, one per programming language— for assessing the educational value of GitHub code • 15 items • Updated 17 days ago • 10
You Do Not Fully Utilize Transformer's Representation Capacity Paper • 2502.09245 • Published 24 days ago • 34
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity Paper • 2502.13063 • Published 19 days ago • 65
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 21 days ago • 141
Expect the Unexpected: FailSafe Long Context QA for Finance Paper • 2502.06329 • Published 28 days ago • 126
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance Paper • 2502.08127 • Published 26 days ago • 50
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 8 items • Updated 14 days ago • 389
view article Article Assisted Generation: a new direction toward low-latency text generation May 11, 2023 • 46
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 341
NeMo Curator - Classifier Models Collection Classifier models that can be used in NeMo Curator for labelling/filtering datasets. • 11 items • Updated 23 days ago • 16