Depth Up-Scaling

by mrfakename - opened Dec 15, 2023

Discussion

mrfakename

Dec 15, 2023

Hi,
Amazing model! Do you plan to open source your methods?

eramax

Dec 15, 2023

This is a fantastic model, could you please share how did you build it, maybe a post or a paper.

Thanks

hunkim

upstage org Dec 15, 2023

We will be submitting them to arXiv shortly. Thank you for your interest!

hunkim changed discussion status to closed Dec 15, 2023

mrfakename

Dec 16, 2023

Thank you! Excited to see it!

mrfakename

Dec 18, 2023

Hi @hunkim , is this basically a merge of Mistral and Llama, trained on more tokens? Were the original Llama weights used, and if so, does the license apply?

hunkim

upstage org Dec 18, 2023

@mrfakename

We use the first and last 24 layers and initialize the Mistral weights. (Yes, the license applies.) Then, we continue pre-training the depth-upscaled model.

In some sense, this can be considered depth upscaling, while we see MOE as width upscaling.

Shamane

Dec 19, 2023

@hunkim Thanks a lot for all this info. I have two questions:

Did you use the first 24 layers from LLMa and the last 24 layers from the mistral in the final merged model? Is there any logic behind selecting the order?
What is the dataset you have used?

Thanks in advanced :).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment