Base

by ehartford - opened Apr 21

Discussion

ehartford

Apr 21

Base model please

ehartford

Apr 21

Instruct is not useful, I can't tune this

MaziyarPanahi

Owner Apr 21

•

edited Apr 21

@ehartford I was actually going to make baes for all 3 of the (11b, 13b, and 16b) and start fine-tuning them.

~~Are you going to fine-tune as well?~~ (once I finish my other fine-tunes I'll make the base ones)

ehartford

Apr 21

•

edited Apr 21

I will not tune them directly

My intent is to use it to initialize the expert weights of a MoE,

Then pretrain and fine-tune on top of that to produce a dolphin MoE

ehartford

Apr 21

I think proving you process by doing it with a Instruct model first is a great strategy to show that the output is coherent and the method is sound

ehartford

Apr 21

@ehartford I was actually going to make baes for all 3 of the (11b, 13b, and 16b) and start fine-tuning them.

~~Are you going to fine-tune as well?~~ (once I finish my other fine-tunes I'll make the base ones)

Thank you!

MaziyarPanahi

Owner Apr 21

@ehartford sounds really interesting! Love to see how they work as experts. Here are the merges based on the base Llama-3-8B:

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment