Model Summary

Introducing the world's first Llama-3 base model with 6B parameters. This model is a untrained model which was created from Meta-Llama-3-8B using a technique called downcycling .

You can check trained version of this model here: https://huggingface.co/prince-canuma/Llama-3-6B-v0.1

Model Description

Developed by: Prince Canuma
Sponsored by: General
Model type: Llama
License: Llama-3

Model Sources

Repository: https://github.com/Blaizzy/Coding-LLMs-from-scratch/tree/main/Llama-3
Video: https://youtube.com/playlist?list=PLDn_JsyofyfTH5_5V1MNb8UYKxMl6IMNy&si=5Y4cm-6wrMOD1Abr

Citation

BibTeX:

@misc{prince2024downcycling,
      title={Efficient LLM Downcycling: Generating Diverse Model Sizes from Pretrained Giants},
      author={Prince Canuma},
      year={2024},
}

Thank You!

I want to extend my heartfelt thanks to the community for the invaluable expertise and unwavering support.

Additionally, I would like to thank Viet from General Catalyst (GC) for providing me with the much needed compute.

This is my most ambitious project yet, and it wouldn't have been possible without the incredible open-source ML community!

Developers, I am eager to see and hear about the innovative fine-tunes and applications you create.

Users, I am excited to learn about your experiences and use cases.

Thank you for your interest and support!

References:

@misc{komatsuzaki2023sparse,
      title={Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints}, 
      author={Aran Komatsuzaki and Joan Puigcerver and James Lee-Thorp and Carlos Riquelme Ruiz and Basil Mustafa and Joshua Ainslie and Yi Tay and Mostafa Dehghani and Neil Houlsby},
      year={2023},
      eprint={2212.05055},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

@misc{sanyal2024pretraining,
      title={Pre-training Small Base LMs with Fewer Tokens}, 
      author={Sunny Sanyal and Sujay Sanghavi and Alexandros G. Dimakis},
      year={2024},
      eprint={2404.08634},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

prince-canuma
/

Llama-3-6B-v0