view article Article Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia? By davanstrien • 6 days ago • 6
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 25 days ago • 504
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck Paper • 2404.07647 • Published Apr 11 • 4
OpenCerebrum-2.0 Collection My open source take on Aether Research's proprietary Cerebrum dataset. • 3 items • Updated 29 days ago • 1
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders Paper • 2404.05961 • Published Apr 9 • 61
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 99
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14 • 54
Augmentable Collection A collection of datasets that should be augmented further with gpt-4 • 13 items • Updated Jan 2 • 4
Qwen1.5 Collection Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated about 3 hours ago • 159
Tiny Series Collection Tiny datasets that empower the foundation of Small Language Model! • 11 items • Updated Jan 26 • 31
Pretrained Text-Generation Models Below 250M Parameters Collection Great candidates for fine-tuning targeting Transformers.js, ordered by number of parameters. • 7 items • Updated Feb 26 • 6
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation Paper • 2312.14187 • Published Dec 20, 2023 • 49
smol llama Collection 🚧"raw" pretrained smol_llama checkpoints - WIP 🚧 • 4 items • Updated 14 days ago • 5
Trained Models 🏋️ Collection They may be small, but they're training like giants! • 8 items • Updated Mar 14 • 14
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 252
InstructWise Collection InstructWise is a series of model created to act as helpful virtual assistant while maintaing the memory efficiency. • 2 items • Updated Dec 3, 2023 • 2