200K Version

by brucethemoose - opened Nov 10, 2023

Nov 10, 2023

This lora seems to work on the 200K version of Yi, but if you ever revisit it, would you consider using that as a base model instead?

The long context is hugely useful (and seems to work well).

Doctor-Shotgun

Owner Nov 11, 2023

It would be possible, but I chose to train on the base version to maintain (hopefully) better compatibility for merging to other models, especially since the base 34b is already usable up to 32k ctx at inference. I would however like to try to train the dataset at full length rather than 4k, however that was a bit compute-prohibitive.

brucethemoose

Nov 11, 2023

maintain (hopefully) better compatibility for merging to other models

You mean most other trainers will use the 34K model as well?

IDK, long context with no RoPE stretching is super appealing to me. I figured everyone would default to the 200K model.

brucethemoose

Nov 11, 2023

Also, I believe at least one other trainer is doing the 200K model: https://old.reddit.com/r/LocalLLaMA/comments/17rzed4/yi34b_vs_yi34b200k_on_sequences_32k_and_4k/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment