krypticmouse
's Collections
LLMs
updated
Language Modeling Is Compression
Paper
•
2309.10668
•
Published
•
82
Baichuan 2: Open Large-scale Language Models
Paper
•
2309.10305
•
Published
•
19
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper
•
2309.11495
•
Published
•
38
LMDX: Language Model-based Document Information Extraction and
Localization
Paper
•
2309.10952
•
Published
•
65
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
•
2309.12307
•
Published
•
88
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Paper
•
2309.11998
•
Published
•
24
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language
Models
Paper
•
2309.12284
•
Published
•
18
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper
•
2309.11568
•
Published
•
10
Contrastive Decoding Improves Reasoning in Large Language Models
Paper
•
2309.09117
•
Published
•
37
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Paper
•
2309.09958
•
Published
•
18
LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language
Models
Paper
•
2309.09506
•
Published
•
14
Cure the headache of Transformers via Collinear Constrained Attention
Paper
•
2309.08646
•
Published
•
12
Struc-Bench: Are Large Language Models Really Good at Generating Complex
Structured Data?
Paper
•
2309.08963
•
Published
•
9
A Distributed Data-Parallel PyTorch Implementation of the Distributed
Shampoo Optimizer for Training Neural Networks At-Scale
Paper
•
2309.06497
•
Published
•
4
Sparse Autoencoders Find Highly Interpretable Features in Language
Models
Paper
•
2309.08600
•
Published
•
13
Agents: An Open-source Framework for Autonomous Language Agents
Paper
•
2309.07870
•
Published
•
42
Ambiguity-Aware In-Context Learning with Large Language Models
Paper
•
2309.07900
•
Published
•
4
Large Language Models for Compiler Optimization
Paper
•
2309.07062
•
Published
•
23
Statistical Rejection Sampling Improves Preference Optimization
Paper
•
2309.06657
•
Published
•
13
Efficient Memory Management for Large Language Model Serving with
PagedAttention
Paper
•
2309.06180
•
Published
•
25
Large Language Model for Science: A Study on P vs. NP
Paper
•
2309.05689
•
Published
•
20
Connecting Large Language Models with Evolutionary Algorithms Yields
Powerful Prompt Optimizers
Paper
•
2309.08532
•
Published
•
53
Augmenting text for spoken language understanding with Large Language
Models
Paper
•
2309.09390
•
Published
•
2
Investigating Answerability of LLMs for Long-Form Question Answering
Paper
•
2309.08210
•
Published
•
12
Replacing softmax with ReLU in Vision Transformers
Paper
•
2309.08586
•
Published
•
17
Uncovering mesa-optimization algorithms in Transformers
Paper
•
2309.05858
•
Published
•
12
Neurons in Large Language Models: Dead, N-gram, Positional
Paper
•
2309.04827
•
Published
•
16
When Less is More: Investigating Data Pruning for Pretraining LLMs at
Scale
Paper
•
2309.04564
•
Published
•
15
Optimize Weight Rounding via Signed Gradient Descent for the
Quantization of LLMs
Paper
•
2309.05516
•
Published
•
9
From Sparse to Dense: GPT-4 Summarization with Chain of Density
Prompting
Paper
•
2309.04269
•
Published
•
32
DoLa: Decoding by Contrasting Layers Improves Factuality in Large
Language Models
Paper
•
2309.03883
•
Published
•
33
One Wide Feedforward is All You Need
Paper
•
2309.01826
•
Published
•
31
Efficient RLHF: Reducing the Memory Usage of PPO
Paper
•
2309.00754
•
Published
•
13
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI
Feedback
Paper
•
2309.00267
•
Published
•
47
YaRN: Efficient Context Window Extension of Large Language Models
Paper
•
2309.00071
•
Published
•
65
RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder
Language Models
Paper
•
2308.07922
•
Published
•
17
CausalLM is not optimal for in-context learning
Paper
•
2308.06912
•
Published
•
18
Self-Alignment with Instruction Backtranslation
Paper
•
2308.06259
•
Published
•
41
Shepherd: A Critic for Language Model Generation
Paper
•
2308.04592
•
Published
•
30
Accelerating LLM Inference with Staged Speculative Decoding
Paper
•
2308.04623
•
Published
•
23
Adapting Large Language Models via Reading Comprehension
Paper
•
2309.09530
•
Published
•
77