Base

#8
by ehartford - opened

Base model please

Instruct is not useful, I can't tune this

@ehartford I was actually going to make baes for all 3 of the (11b, 13b, and 16b) and start fine-tuning them.

Are you going to fine-tune as well? (once I finish my other fine-tunes I'll make the base ones)

I will not tune them directly

My intent is to use it to initialize the expert weights of a MoE,

Then pretrain and fine-tune on top of that to produce a dolphin MoE

I think proving you process by doing it with a Instruct model first is a great strategy to show that the output is coherent and the method is sound

@ehartford I was actually going to make baes for all 3 of the (11b, 13b, and 16b) and start fine-tuning them.

Are you going to fine-tune as well? (once I finish my other fine-tunes I'll make the base ones)

Thank you!

@ehartford sounds really interesting! Love to see how they work as experts. Here are the merges based on the base Llama-3-8B:

Sign up or log in to comment