Text Generation
Transformers
Safetensors
English
mistral
Merge
conversational
Eval Results
Inference Endpoints
text-generation-inference
SynthIQ-7b / README.md
sethuiyer's picture
Adding Evaluation Results (#2)
cd98338 verified
metadata
language:
  - en
license: llama2
library_name: transformers
tags:
  - mistral
  - merge
datasets:
  - stingning/ultrachat
  - garage-bAInd/Open-Platypus
  - Open-Orca/OpenOrca
  - TIGER-Lab/MathInstruct
  - OpenAssistant/oasst_top1_2023-08-25
  - teknium/openhermes
  - meta-math/MetaMathQA
  - Open-Orca/SlimOrca
pipeline_tag: text-generation
base_model:
  - Weyaxi/OpenHermes-2.5-neural-chat-v3-3-openchat-3.5-1210-Slerp
  - ehartford/dolphin-2.1-mistral-7b
  - Open-Orca/Mistral-7B-OpenOrca
  - bhenrym14/mistral-7b-platypus-fp16
  - ehartford/samantha-1.2-mistral-7b
  - iteknium/CollectiveCognition-v1.1-Mistral-7B
  - HuggingFaceH4/zephyr-7b-alpha
model-index:
  - name: sethuiyer/SynthIQ-7b
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 65.87
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 85.82
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 64.75
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 57
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 78.69
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 64.06
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/SynthIQ-7b
          name: Open LLM Leaderboard

SynthIQ

SynthIQ

This is SynthIQ, rated 92.23/100 by GPT-4 across varied complex prompts. I used mergekit to merge models.

Benchmark Name Score
ARC 65.87
HellaSwag 85.82
MMLU 64.75
TruthfulQA 57.00
Winogrande 78.69
GSM8K 64.06
AGIEval 42.67
GPT4All 73.71
Bigbench 44.59

Update - 19/01/2024

Tested to work well with autogen and CrewAI

GGUF Files

Q4_K_M - medium, balanced quality - recommended

Q_6_K - very large, extremely low quality loss

Q8_0 - very large, extremely low quality loss - not recommended

Important Update: SynthIQ is now available on Ollama. You can use it by running the command ollama run stuehieyr/synthiq in your terminal. If you have limited computing resources, check out this video to learn how to run it on a Google Colab backend.

Yaml Config


slices:
  - sources:
      - model: Weyaxi/OpenHermes-2.5-neural-chat-v3-3-openchat-3.5-1210-Slerp
        layer_range: [0, 32]
      - model: uukuguy/speechless-mistral-six-in-one-7b
        layer_range: [0, 32]

merge_method: slerp
base_model: mistralai/Mistral-7B-v0.1

parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5 # fallback for rest of tensors
tokenizer_source: union

dtype: bfloat16

Prompt template: ChatML

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

License is LLama2 license as uukuguy/speechless-mistral-six-in-one-7b is llama2 license.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Nous Benchmark Evalation Results

Detailed results can be found here

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 69.37
AI2 Reasoning Challenge (25-Shot) 65.87
HellaSwag (10-Shot) 85.82
MMLU (5-Shot) 64.75
TruthfulQA (0-shot) 57.00
Winogrande (5-shot) 78.69
GSM8k (5-shot) 64.06