PeppePasti
's Collections
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
Paper
•
2408.15545
•
Published
•
34
Controllable Text Generation for Large Language Models: A Survey
Paper
•
2408.12599
•
Published
•
63
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper
•
2408.10914
•
Published
•
41
Automated Design of Agentic Systems
Paper
•
2408.08435
•
Published
•
38
Gemini 1.5: Unlocking multimodal understanding across millions of tokens
of context
Paper
•
2403.05530
•
Published
•
60
Fast Inference from Transformers via Speculative Decoding
Paper
•
2211.17192
•
Published
•
4
Unlocking Efficiency in Large Language Model Inference: A Comprehensive
Survey of Speculative Decoding
Paper
•
2401.07851
•
Published
•
1
Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding
Paper
•
2305.00633
•
Published
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper
•
2309.11495
•
Published
•
38
Fine-Grained Human Feedback Gives Better Rewards for Language Model
Training
Paper
•
2306.01693
•
Published
•
3
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Paper
•
2408.12570
•
Published
•
30
Gemma 2: Improving Open Language Models at a Practical Size
Paper
•
2408.00118
•
Published
•
74
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
603
LLM in a flash: Efficient Large Language Model Inference with Limited
Memory
Paper
•
2312.11514
•
Published
•
258
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
•
2403.03507
•
Published
•
182
DoLa: Decoding by Contrasting Layers Improves Factuality in Large
Language Models
Paper
•
2309.03883
•
Published
•
33
Textbooks Are All You Need
Paper
•
2306.11644
•
Published
•
142
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Paper
•
2306.02707
•
Published
•
46
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper
•
2408.11796
•
Published
•
55
Language Modeling on Tabular Data: A Survey of Foundations, Techniques
and Evolution
Paper
•
2408.10548
•
Published
PEDAL: Enhancing Greedy Decoding with Large Language Models using
Diverse Exemplars
Paper
•
2408.08869
•
Published
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper
•
2408.15237
•
Published
•
37
Leveraging Open Knowledge for Advancing Task Expertise in Large Language
Models
Paper
•
2408.15915
•
Published
•
19
Knowledge Navigator: LLM-guided Browsing Framework for Exploratory
Search in Scientific Literature
Paper
•
2408.15836
•
Published
•
12
Training Compute-Optimal Large Language Models
Paper
•
2203.15556
•
Published
•
10
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
•
2402.19427
•
Published
•
52
Scaling Law with Learning Rate Annealing
Paper
•
2408.11029
•
Published
•
3
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden
Reasoning Process
Paper
•
2407.20311
•
Published
•
4
Physics of Language Models: Part 1, Context-Free Grammar
Paper
•
2305.13673
•
Published
•
7
Physics of Language Models: Part 3.2, Knowledge Manipulation
Paper
•
2309.14402
•
Published
•
6
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Paper
•
2404.05405
•
Published
•
9
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Paper
•
2309.14316
•
Published
•
7
Physics of Language Models: Part 2.2, How to Learn From Mistakes on
Grade-School Math Problems
Paper
•
2408.16293
•
Published
•
25
Language Models are Few-Shot Learners
Paper
•
2005.14165
•
Published
•
11
ContextCite: Attributing Model Generation to Context
Paper
•
2409.00729
•
Published
•
13
OLMoE: Open Mixture-of-Experts Language Models
Paper
•
2409.02060
•
Published
•
77
LongRecipe: Recipe for Efficient Long Context Generalization in Large
Languge Models
Paper
•
2409.00509
•
Published
•
38
LongCite: Enabling LLMs to Generate Fine-grained Citations in
Long-context QA
Paper
•
2409.02897
•
Published
•
44
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining
Paper
•
2409.02326
•
Published
•
18
Attention Heads of Large Language Models: A Survey
Paper
•
2409.03752
•
Published
•
88
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal
Sampling
Paper
•
2408.16737
•
Published
Many-Shot In-Context Learning
Paper
•
2404.11018
•
Published
•
4
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with
High-Quality Data
Paper
•
2409.03810
•
Published
•
30
Configurable Foundation Models: Building LLMs from a Modular Perspective
Paper
•
2409.02877
•
Published
•
27
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM
Instruction-Tuning
Paper
•
2402.10110
•
Published
•
3
Making the Most of your Model: Methods for Finetuning and Applying
Pretrained Transformers
Paper
•
2408.16241
•
Published
Towards a Unified View of Preference Learning for Large Language Models:
A Survey
Paper
•
2409.02795
•
Published
•
72
PingPong: A Benchmark for Role-Playing Language Models with User
Emulation and Multi-Model Evaluation
Paper
•
2409.06820
•
Published
•
63
Can Large Language Models Unlock Novel Scientific Research Ideas?
Paper
•
2409.06185
•
Published
•
12
Self-Harmonized Chain of Thought
Paper
•
2409.04057
•
Published
•
16
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector
Retrieval
Paper
•
2409.10516
•
Published
•
39
Kolmogorov-Arnold Transformer
Paper
•
2409.10594
•
Published
•
38
A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language
Models: An Experimental Analysis up to 405B
Paper
•
2409.11055
•
Published
•
16
A Controlled Study on Long Context Extension and Generalization in LLMs
Paper
•
2409.12181
•
Published
•
43
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic
reasoning
Paper
•
2409.12183
•
Published
•
36
Training Language Models to Self-Correct via Reinforcement Learning
Paper
•
2409.12917
•
Published
•
135
Language Models Learn to Mislead Humans via RLHF
Paper
•
2409.12822
•
Published
•
9
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
Paper
•
2409.15277
•
Published
•
34
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining
for Clinical LLMs
Paper
•
2409.14988
•
Published
•
21
A Case Study of Web App Coding with OpenAI Reasoning Models
Paper
•
2409.13773
•
Published
•
5
EuroLLM: Multilingual Language Models for Europe
Paper
•
2409.16235
•
Published
•
24
OmniBench: Towards The Future of Universal Omni-Language Models
Paper
•
2409.15272
•
Published
•
25
NoTeeline: Supporting Real-Time Notetaking from Keypoints with Large
Language Models
Paper
•
2409.16493
•
Published
•
9
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
Paper
•
2409.17481
•
Published
•
46
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs
with 1000x Input Token Reduction
Paper
•
2409.17422
•
Published
•
24