SwallowCode Collection Rewriting Pre-Training Data Boosts LLM Performance in Math and Code • 66 items • Updated May 7 • 3
SwallowMath Collection Rewriting Pre-Training Data Boosts LLM Performance in Math and Code • 11 items • Updated May 7 • 3
LLM-jp-3 Pre-trained Models Collection Pre-trained models in the LLM-jp-3 model series • 10 items • Updated 23 days ago • 6
LLM-jp-3 Fine-tuned Models Collection Fine-tuned models in the LLM-jp-3 model series • 25 items • Updated 23 days ago • 6
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search Paper • 2503.04412 • Published Mar 6 • 1
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization Paper • 2502.19261 • Published Feb 26 • 7
TinySwallow Collection Compact Japanese models trained with "TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models" • 5 items • Updated Jan 30 • 17
Agent Skill Acquisition for Large Language Models via CycleQD Paper • 2410.14735 • Published Oct 16, 2024 • 2
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs Paper • 2407.03963 • Published Jul 4, 2024 • 19