vilm
/

Text Generation
Transformers
Safetensors
English
qwen2
conversational
Eval Results
Inference Endpoints
text-generation-inference
Quyen-Plus-v0.1 / README.md
qnguyen3's picture
Adding Evaluation Results (#2)
a949c32 verified
metadata
language:
  - en
license: other
library_name: transformers
datasets:
  - teknium/OpenHermes-2.5
  - LDJnr/Capybara
  - Intel/orca_dpo_pairs
  - argilla/distilabel-capybara-dpo-7k-binarized
pipeline_tag: text-generation
model-index:
  - name: Quyen-Plus-v0.1
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 55.72
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vilm/Quyen-Plus-v0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 78.52
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vilm/Quyen-Plus-v0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 60.45
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vilm/Quyen-Plus-v0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 53.6
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vilm/Quyen-Plus-v0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 71.27
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vilm/Quyen-Plus-v0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 60.05
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vilm/Quyen-Plus-v0.1
          name: Open LLM Leaderboard

Quyen

Quyen

Model Description

Quyen is our first flagship LLM series based on the Qwen1.5 family. We introduced 6 different versions:

  • Quyen-SE (0.5B)
  • Quyen-Mini (1.8B)
  • Quyen (4B)
  • Quyen-Plus (7B)
  • Quyen-Pro (14B)
  • Quyen-Pro-Max (72B)

All models were trained with SFT and DPO using the following dataset:

  • OpenHermes-2.5 by Teknium
  • Capyabara by LDJ
  • argilla/distilabel-capybara-dpo-7k-binarized by argilla
  • orca_dpo_pairs by Intel
  • and Private Data by Ontocord & BEE-spoke-data

Prompt Template

  • All Quyen models use ChatML as the default template:
<|im_start|>system
You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.<|im_end|>
<|im_start|>user
Hello world.<|im_end|>
<|im_start|>assistant
  • You can also use apply_chat_template:
messages = [
    {"role": "system", "content": "You are a sentient, superintelligent artificial general intelligence, here to teach and assist me."},
    {"role": "user", "content": "Hello world."}
]
gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
model.generate(**gen_input)

Benchmarks:

  • Coming Soon! We will update the benchmarks later

Acknowledgement

  • We're incredibly grateful to Tensoic and Ontocord for their generous support with compute and data preparation.
  • Special thanks to the Qwen team for letting us access the models early for these amazing finetunes.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 63.27
AI2 Reasoning Challenge (25-Shot) 55.72
HellaSwag (10-Shot) 78.52
MMLU (5-Shot) 60.45
TruthfulQA (0-shot) 53.60
Winogrande (5-shot) 71.27
GSM8k (5-shot) 60.05