grimulkan/lzlv-longLORA-70b-rope8-32k-fp16 · Quantization calibration dataset

Jan 23, 2024

Lzlv long context is the holy grail for me so if you have solved this, I tip my hat to you.

I have a question. To quantize this to exl2, can I just use one of the usual calibration datasets (e.g. PIPPA)? Or do I need something with longer token lengths?

grimulkan

Owner Jan 23, 2024

•

edited Jan 23, 2024

I wouldn't say it is "solved", but it seems to not be broken :) It is probably "solvable" with either further 32K fine-tuning or more merging, if it isn't solved already.

I also experimented with fine-tuning a Goliath-style interleaved merge of lzlv and Euryale, each with the longLORA merge, and using the same training regimen as Aurelian. It seems to be working also... so there is hope for this either way.

For EXL2, I experimented with 32K datasets, and at least for 4 bits and above, there was no benefit from measuring over the full 32K context length (2048 or 4096 seemed to saturate the benefit). Also, the 32K gave me headaches with numerical issues and .safetensor saving limits that were a pain to deal with (in addition to being super slow). It might matter more for lower bit depths, I'm not sure. The important thing is to use the correct rope scaling while measuring for all cases (in this case, linear scaling of 8).

I didn't try PIPPA specifically (I ran into some robustness issues when trying PIPPA), but I tried the default EXL2 calibration dataset, and a subset of the Aurelian training set (which goes to 32K), for example.

Amajiro

Jan 24, 2024

Thanks, I've successfully quantized to 3.3bpw using the PIPPA dataset and it all seems to be working well, picking up the long context beautifully.