--- license: other tags: - merge license_name: yi-license license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE model-index: - name: CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-ExtremeDensity results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 66.89 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-ExtremeDensity name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 85.69 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-ExtremeDensity name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 77.35 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-ExtremeDensity name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 57.63 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-ExtremeDensity name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 82.0 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-ExtremeDensity name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 59.82 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-ExtremeDensity name: Open LLM Leaderboard --- Just a test of a very high density DARE ties merge, for benchmarking on the open llm leaderboard. You probably shouldn't use this model, use this one instead: https://huggingface.co/brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-HighDensity mergekit config: ``` models: - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama # no parameters necessary for base model - model: /home/alpha/Storage/Models/Raw/migtissera_Tess-34B-v1.4 parameters: weight: 0.19 density: 0.83 - model: /home/alpha//Storage/Models/Raw/bhenrym14_airoboros-3_1-yi-34b-200k parameters: weight: 0.14 density: 0.6 - model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B parameters: weight: 0.19 density: 0.83 - model: /home/alpha/Storage/Models/Raw/kyujinpy_PlatYi-34B-200K-Q parameters: weight: 0.14 density: 0.6 - model: /home/alpha/FastModels/ehartford_dolphin-2.2-yi-34b-200k parameters: weight: 0.19 density: 0.83 - model: /home/alpha/FastModels/fblgit_una-xaberius-34b-v1beta parameters: weight: 0.15 density: 0.08 merge_method: dare_ties base_model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama parameters: int8_mask: true dtype: bfloat16 ``` # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_brucethemoose__CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-ExtremeDensity) | Metric |Value| |---------------------------------|----:| |Avg. |71.57| |AI2 Reasoning Challenge (25-Shot)|66.89| |HellaSwag (10-Shot) |85.69| |MMLU (5-Shot) |77.35| |TruthfulQA (0-shot) |57.63| |Winogrande (5-shot) |82.00| |GSM8k (5-shot) |59.82|