Edit model card

Buy me a Ko-FiSupport my work using Patreon

OpenChat-3.5-0106_32K-PoSE

Description

This model is Openchat-3.5-0106 with the context length extended from 8192 tokens to 32768 tokens using PoSE.

The model was fine-tuned using Rank-Stabilized LoRA and the LongAlpaca-12K dataset. I hope to continue extending the context in future versions and then apply the same methods to my upscaled versions of OpenChat-3.5 that were created using Block Expansion instead of Depth UP Scaling.

After fine-tuning, the model was tested using passkey retrieval and achieved a score of 100%. Below you can also find the results of the Open LLM Leaderboard evaluations and I am a bit disappointed with those. The model ended up with a significant reduction in performance compared to the original model in all but one test (MUSR). I expected it to do better than the original model on MUSR since that test benefits from long context understanding but I didn't expect such a negative impact on the other tasks. Anyway, I will be addressing this on a future version. I used the LongAlpaca-12K dataset because it is small and I have limited computational resources but I might have to try a larger dataset for the next attempt. If you would like to help me, there are links on the top of the model card for my Patreon and Ko-Fi.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 12.70
IFEval (0-Shot) 39.69
BBH (3-Shot) 8.83
MATH Lvl 5 (4-Shot) 1.44
GPQA (0-shot) 3.47
MuSR (0-shot) 11.33
MMLU-PRO (5-shot) 11.46

Citation

@misc{zhu2024poseefficientcontextwindow,
      title={PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training}, 
      author={Dawei Zhu and Nan Yang and Liang Wang and Yifan Song and Wenhao Wu and Furu Wei and Sujian Li},
      year={2024},
      eprint={2309.10400},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2309.10400}, 
}
Downloads last month
232
GGUF
Model size
7.24B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for mav23/OpenChat-3.5-0106_32K-PoSE-GGUF

Quantized
(24)
this model

Dataset used to train mav23/OpenChat-3.5-0106_32K-PoSE-GGUF

Evaluation results