File size: 5,438 Bytes

e0f212a

---
license: apache-2.0
library_name: transformers
base_model:
- openchat/openchat-3.5-0106
datasets:
- Yukang/LongAlpaca-12k
model-index:
- name: OpenChat-3.5-0106_32K-PoSE
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 39.69
      name: strict accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Pretergeek/OpenChat-3.5-0106_32K-PoSE
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 8.83
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Pretergeek/OpenChat-3.5-0106_32K-PoSE
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 1.44
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Pretergeek/OpenChat-3.5-0106_32K-PoSE
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 3.47
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Pretergeek/OpenChat-3.5-0106_32K-PoSE
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 11.33
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Pretergeek/OpenChat-3.5-0106_32K-PoSE
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 11.46
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Pretergeek/OpenChat-3.5-0106_32K-PoSE
      name: Open LLM Leaderboard
---
<p align="center">
  <a href="https://ko-fi.com/pretergeek">Buy me a Ko-Fi</a> •
  <a href="https://patreon.com/Pretergeek">Support my work using Patreon</a>
</p>

# OpenChat-3.5-0106_32K-PoSE

## Description

This model is [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) with the context length extended from 8192 tokens to 32768 tokens using [PoSE](https://huggingface.co/papers/2309.10400).

The model was fine-tuned using [Rank-Stabilized LoRA](https://huggingface.co/blog/damjan-k/rslora) and the [LongAlpaca-12K](Yukang/LongAlpaca-12k) dataset. I hope to continue extending the context in future versions and then apply the same methods to my [upscaled versions of OpenChat-3.5](https://huggingface.co/collections/Pretergeek/openchat-35-0106-with-additional-layers-66a8d3262c7c3ebdd7783a29) that were created using Block Expansion instead of Depth UP Scaling.

After fine-tuning, the model was tested using passkey retrieval and achieved a score of 100%. Below you can also find the results of the Open LLM Leaderboard evaluations and I am a bit disappointed with those. The model ended up with a significant reduction in performance compared to the original model in all but one test (MUSR). I expected it to do better than the original model on MUSR since that test benefits from long context understanding but I didn't expect such a negative impact on the other tasks. Anyway, I will be addressing this on a future version. I used the LongAlpaca-12K dataset because it is small and I have limited computational resources but I might have to try a larger dataset for the next attempt. If you would like to help me, there are links on the top of the model card for my Patreon and Ko-Fi.

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Pretergeek__OpenChat-3.5-0106_32K-PoSE)

|      Metric       |Value|
|-------------------|----:|
|Avg.               |12.70|
|IFEval (0-Shot)    |39.69|
|BBH (3-Shot)       | 8.83|
|MATH Lvl 5 (4-Shot)| 1.44|
|GPQA (0-shot)      | 3.47|
|MuSR (0-shot)      |11.33|
|MMLU-PRO (5-shot)  |11.46|

# Citation
```
@misc{zhu2024poseefficientcontextwindow,
      title={PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training}, 
      author={Dawei Zhu and Nan Yang and Liang Wang and Yifan Song and Wenhao Wu and Furu Wei and Sujian Li},
      year={2024},
      eprint={2309.10400},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2309.10400}, 
}
```