How is this different from v1?

#2
by amgadhasan - opened

Title

it seems they changed rope theta to 1e6 for all their models.

32k context

@Yuuru What is the source of this information?

it seems they changed rope theta to 1e6 for all their models.

They also set "sliding_window" to null for some reason.

@Yuuru What is the source of this information?

The config.json file. (it's the same context size as the previous version)

@mrfakename , vllm says it btw when loading the model

 […] max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, seed=0

Yeah it would be interesting to understand how it's actually different from the first one.

It is a lot less obedient, for one: v0.1 refuses to answer a sixth of my test prompts, while v0.2 refuses to answer three‑quarters of them.

Sign up or log in to comment