File size: 4,838 Bytes

---
license: llama2
tags:
- mergekit
- merge
model-index:
- name: WinterGoddess-1.4x-70b-32k
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 71.16
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/WinterGoddess-1.4x-70b-32k
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 89.12
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/WinterGoddess-1.4x-70b-32k
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 66.42
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/WinterGoddess-1.4x-70b-32k
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 63.87
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/WinterGoddess-1.4x-70b-32k
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 82.56
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/WinterGoddess-1.4x-70b-32k
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 43.29
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/WinterGoddess-1.4x-70b-32k
      name: Open LLM Leaderboard
---

This is a 32k version of Sao10K/WinterGoddess-1.4x-70B-L2, extended using method discussed [here](https://huggingface.co/grimulkan/aurelian-v0.5-70b-rope8-32K-fp16/discussions/2).

# Quants
Thanks for GGUF, [@Nexesenex](https://huggingface.co/Nexesenex)!
- [GGUF](https://huggingface.co/Nexesenex/ChuckMcSneed_WinterGoddess-1.4x-70b-32k-iMat.GGUF)


# Benchmarks
### NeoEvalPlusN_benchmark
[My meme benchmark.](https://huggingface.co/datasets/ChuckMcSneed/NeoEvalPlusN_benchmark)

| Test name  | WinterGoddess | WinterGoddess-32k |
| ---------- | ---------- | -------  |
| B | 2 | 2.5 |
| C | 1.5 | 2 |
| D | 3 | 0 |
| S | 2.75 | 1.5 |
| P | 5.5 | 2.25 |
| Total | 14.75 | 8.25 |

### [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
[Leaderboard on Huggingface](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|Model                                  |Average|ARC  |HellaSwag|MMLU |TruthfulQA|Winogrande|GSM8K|
|---------------------------------------|-------|-----|---------|-----|----------|----------|-----|
|Sao10K/WinterGoddess-1.4x-70B-L2       |73.23  |72.78|90.11    |71.12|65.76     |85        |54.59|
|ChuckMcSneed/WinterGoddess-1.4x-70b-32k|69.4   |71.16|89.12    |66.42|63.87     |82.56     |43.29|
|Difference                             |3.83   |1.62 |0.99     |4.7  |1.89      |2.44      |11.3 |

Here the losses seem far less brutal than on my bench. It seems that extending with LongLORA kills MMLU and GSM8K performance.

Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ChuckMcSneed__WinterGoddess-1.4x-70b-32k)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |69.40|
|AI2 Reasoning Challenge (25-Shot)|71.16|
|HellaSwag (10-Shot)              |89.12|
|MMLU (5-Shot)                    |66.42|
|TruthfulQA (0-shot)              |63.87|
|Winogrande (5-shot)              |82.56|
|GSM8k (5-shot)                   |43.29|