Edit model card

Model Summary

Llama-3-6B

Introducing the world's first Llama-3 base model with 6B parameters. This model is a untrained model which was created from Meta-Llama-3-8B using a technique called downcycling .

You can check trained version of this model here: https://huggingface.co/prince-canuma/Llama-3-6B-v0.1

Model Description

Model Sources

Citation

BibTeX:

@misc{prince2024downcycling,
      title={Efficient LLM Downcycling: Generating Diverse Model Sizes from Pretrained Giants},
      author={Prince Canuma},
      year={2024},
}

Thank You!

I want to extend my heartfelt thanks to the community for the invaluable expertise and unwavering support.

Additionally, I would like to thank Viet from General Catalyst (GC) for providing me with the much needed compute.

This is my most ambitious project yet, and it wouldn't have been possible without the incredible open-source ML community!

Developers, I am eager to see and hear about the innovative fine-tunes and applications you create.

Users, I am excited to learn about your experiences and use cases.

Thank you for your interest and support!

References:

@misc{komatsuzaki2023sparse,
      title={Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints}, 
      author={Aran Komatsuzaki and Joan Puigcerver and James Lee-Thorp and Carlos Riquelme Ruiz and Basil Mustafa and Joshua Ainslie and Yi Tay and Mostafa Dehghani and Neil Houlsby},
      year={2023},
      eprint={2212.05055},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
@misc{sanyal2024pretraining,
      title={Pre-training Small Base LMs with Fewer Tokens}, 
      author={Sunny Sanyal and Sujay Sanghavi and Alexandros G. Dimakis},
      year={2024},
      eprint={2404.08634},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
12
Safetensors
Model size
6.29B params
Tensor type
FP16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.