-
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 17 -
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 74 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 22 -
Zoology: Measuring and Improving Recall in Efficient Language Models
Paper • 2312.04927 • Published • 2
Aneta Melisa Stal
melisa
AI & ML interests
NLP
Organizations
Collections
3
models
44
melisa/bi_score_meta-llama_Meta-Llama-3-8B-Instruct-20-9
Text Generation
•
Updated
•
2
melisa/bi_score_meta-llama_Meta-Llama-3-8B-Instruct-20-8
Text Generation
•
Updated
•
2
melisa/bi_score_meta-llama_Meta-Llama-3-8B-Instruct-22-7
Text Generation
•
Updated
•
2
melisa/bi_score_meta-llama_Meta-Llama-3-8B-Instruct-22-6
Text Generation
•
Updated
•
1
melisa/bi_score_meta-llama_Meta-Llama-3-8B-Instruct-22-5
Text Generation
•
Updated
•
1
melisa/bi_score_meta-llama_Meta-Llama-3-8B-Instruct-23-4
Text Generation
•
Updated
melisa/bi_score_meta-llama_Meta-Llama-3-8B-Instruct-24-3
Text Generation
•
Updated
•
1
melisa/bi_score_meta-llama_Meta-Llama-3-8B-Instruct-24-2
Text Generation
•
Updated
•
1
melisa/bi_score_meta-llama_Meta-Llama-3-8B-Instruct-24-1
Text Generation
•
Updated
•
1
melisa/bi_score_01-ai_Yi-1.5-9B-Chat-16K_cut_20_5
Updated
datasets
None public yet