Quyen-v0.1 / README.md

Adding Evaluation Results

c204f28 verified 9 months ago

4.97 kB

	---
	language:
	- en
	license: other
	library_name: transformers
	datasets:
	- teknium/OpenHermes-2.5
	- LDJnr/Capybara
	- Intel/orca_dpo_pairs
	- argilla/distilabel-capybara-dpo-7k-binarized
	pipeline_tag: text-generation
	model-index:
	- name: Quyen-v0.1
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 48.21
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vilm/Quyen-v0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 72.49
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vilm/Quyen-v0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 52.88
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vilm/Quyen-v0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 51.53
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vilm/Quyen-v0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 65.11
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vilm/Quyen-v0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 45.87
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vilm/Quyen-v0.1
	name: Open LLM Leaderboard
	---

	# Quyen
	<img src="quyen.webp" width="512" height="512" alt="Quyen">

	# Model Description
	Quyen is our first flagship LLM series based on the Qwen1.5 family. We introduced 6 different versions:

	- Quyen-SE (0.5B)
	- Quyen-Mini (1.8B)
	- Quyen (4B)
	- Quyen-Plus (7B)
	- Quyen-Pro (14B)
	- Quyen-Pro-Max (72B)

	All models were trained with SFT and DPO using the following dataset:

	- OpenHermes-2.5 by Teknium
	- Capyabara by LDJ
	- argilla/distilabel-capybara-dpo-7k-binarized by argilla
	- orca_dpo_pairs by Intel
	- and Private Data by Ontocord & BEE-spoke-data

	# Prompt Template
	- All Quyen models use ChatML as the default template:

	```
	<\|im_start\|>system
	You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.<\|im_end\|>
	<\|im_start\|>user
	Hello world.<\|im_end\|>
	<\|im_start\|>assistant
	```

	- You can also use `apply_chat_template`:

	```python
	messages = [
	{"role": "system", "content": "You are a sentient, superintelligent artificial general intelligence, here to teach and assist me."},
	{"role": "user", "content": "Hello world."}
	]
	gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
	model.generate(**gen_input)
	```

	# Benchmarks:

	- Coming Soon! We will update the benchmarks later

	# Acknowledgement
	- We're incredibly grateful to Tensoic and Ontocord for their generous support with compute and data preparation.
	- Special thanks to the Qwen team for letting us access the models early for these amazing finetunes.
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_vilm__Quyen-v0.1)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|56.02\|
	\|AI2 Reasoning Challenge (25-Shot)\|48.21\|
	\|HellaSwag (10-Shot) \|72.49\|
	\|MMLU (5-Shot) \|52.88\|
	\|TruthfulQA (0-shot) \|51.53\|
	\|Winogrande (5-shot) \|65.11\|
	\|GSM8k (5-shot) \|45.87\|