xLAM models Collection xLAM: A Family of Large Action Models to Empower AI Agent Systems: https://github.com/SalesforceAIResearch/xLAM • 11 items • Updated 22 days ago • 43
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 40 items • Updated 3 days ago • 223
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 8 items • Updated 17 days ago • 171
Useful Pretrain-Datasets Collection pretrain-datasets with (maybe) good quality • 20 items • Updated Jun 12 • 1
GPT-4 generated datasets Collection Collection of some GPT-4 generated datasets. It may be useful for those looking for the best-quality datasets to train competitive LLMs. • 18 items • Updated Apr 16 • 8
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 63
Scalable Pre-training of Large Autoregressive Image Models Paper • 2401.08541 • Published Jan 16 • 36
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models Paper • 2401.06951 • Published Jan 13 • 25
Papers about model merging Collection referenced in the mergekit repo: https://github.com/cg123/mergekit • 4 items • Updated Feb 13 • 14
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models Paper • 2309.14509 • Published Sep 25, 2023 • 17
SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference Paper • 2307.02628 • Published Jul 5, 2023 • 10
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding Paper • 2306.17107 • Published Jun 29, 2023 • 11
Extending Context Window of Large Language Models via Positional Interpolation Paper • 2306.15595 • Published Jun 27, 2023 • 53