200K Version?

#3
by brucethemoose - opened

Could we get a version of this model trained on Yi 200K instead?

It doesn't have to be trained at the full 200K CTX size. Finetunes trained at lesser context sizes still seem to hold the long context performance.

Would love to see this

Yeah, 4096 is too short for most task that requires text summerization

It doesn't have to be trained at the full 200K CTX size. Finetunes trained at lesser context sizes still seem to hold the long context performance.

I may have spoke too soon on this. It seems that Bagel's 4K training actually hurt Yi 200K's context considerably.

I suspect its because it was partially a full finetune, not a lora?

Sign up or log in to comment