Adding Evaluation Results (#1)

00ff4d5 verified 9 months ago

8.74 kB

	---
	language:
	- ja
	- en
	- zh
	license: apache-2.0
	model-index:
	- name: laser-polyglot-4x7b
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 64.16
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/laser-polyglot-4x7b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 84.98
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/laser-polyglot-4x7b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 63.88
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/laser-polyglot-4x7b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 55.47
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/laser-polyglot-4x7b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 77.82
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/laser-polyglot-4x7b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 48.45
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/laser-polyglot-4x7b
	name: Open LLM Leaderboard
	---
	# Polyglot-4x7b-24b

	![polyglot](polyglot.png)

	Polyglot-4x7b is a Mixture of Experts approach to a multilingual model.

	This project is an experiment to see if each expert can be of a different language. The answer is yes.

	The model is a merge of models that are capable of Chinese and Japanese output.

	+ teknium/OpenHermes-2.5-Mistral-7B
	+ oshizo/japanese-e5-mistral-7b_slerp
	+ cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser
	+ s3nh/Mistral-7B-Evol-Instruct-Chinese

	TODO:
	1. [] polyglot tokenizer

	## Other polyglot models

	+ [macadeliccc/Polyglot-8x7b-v0.1](https://huggingface.co/macadeliccc/Polyglot-8x7b-v0.1) (adds 3 more languages)
	# Code Example

	Inference [Colab](https://colab.research.google.com/drive/1tYSb63IKZDsiQ5BIJU8Oc92phxugAmB3?usp=sharing)
	Live demo available on [Spaces](https://huggingface.co/spaces/macadeliccc/polyglot-4x7b-chat?logs=build)

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	def generate_response(prompt):
	"""
	Generate a response from the model based on the input prompt.

	Args:
	prompt (str): Prompt for the model.

	Returns:
	str: The generated response from the model.
	"""
	# Tokenize the input prompt
	inputs = tokenizer(prompt, return_tensors="pt")

	# Generate output tokens
	outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)

	# Decode the generated tokens to a string
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)

	return response

	# Load the model and tokenizer
	model_id = "macadeliccc/laser-polyglot-4x7b"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)

	# Example prompts in different languages
	english_prompt = "Write a quicksort algorithm in python"
	chinese_prompt = "用Python写一个快速排序算法"
	japanese_prompt = "Pythonでクイックソートアルゴリズムを書いてください"

	# Generate and print responses for each language
	print("English Response:")
	print(generate_response(english_prompt), "\n")

	print("Chinese Response:")
	print(generate_response(chinese_prompt), "\n")

	print("Japanese Response:")
	print(generate_response(japanese_prompt), "\n")

	```

	## Example Output
	English:

	Write a quicksort algorithm in python.

	```python
	def quicksort(arr):
	if len(arr) <= 1:
	return arr
	else:
	pivot = arr[0]
	less = [i for i in arr[1:] if i <= pivot]
	greater = [i for i in arr[1:] if i > pivot]
	return quicksort(less) + [pivot] + quicksort(greater)

	arr = [5, 2, 9, 1, 5, 7, 4, 8, 6, 3]
	print(quicksort(arr))
	```

	This is a simple implementation of the quicksort algorithm in python. The function `quicksort` takes an array as input and returns a sorted array. The algorithm works by selecting a pivot element from the array and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. The process is then repeated recursively on the sub-arrays until the entire array is sorted.

	Chinese Response:

	用Python写一个快速排序算法

	```python
	def quick_sort(arr):
	if len(arr) <= 1:
	return arr
	else:
	pivot = arr[0]
	less = [i for i in arr[1:] if i <= pivot]
	greater = [i for i in arr[1:] if i > pivot]
	return quick_sort(less) + [pivot] + quick_sort(greater)

	arr = [3, 5, 2, 1, 4, 6, 8, 7]
	print(quick_sort(arr))
	```
	这个程序的时间复杂度为O(nlogn)，空间复杂度为O(n)。

	Japanese Response:

	Pythonでクイックソートアルゴリズムを書いてください。

	```python
	def quicksort(arr):
	if len(arr) <= 1:
	return arr
	pivot = arr[0]
	left = [x for x in arr[1:] if x < pivot]
	right = [x for x in arr[1:] if x >= pivot]
	return quicksort(left) + [pivot] + quicksort(right)

	print(quicksort([3,6,8,10,1,5,9,2,4,7]))
	```

	このコードはクイックソートアルゴリズムを実装しています。クイックソートは一種の分割と conquers アルゴリズムで、配列を分割し、それぞれの部分配列を再帰的にソートします。

	この実装では、配列の最初の要素をピボットとして使用します。そして、配列を2つの



	# Evaluations

	\| Tasks \|Version\|Filter\|n-shot\| Metric \|Value \| \|Stderr\|
	\|-------------\|-------\|------\|-----:\|--------\|-----:\|---\|-----:\|
	\|arc_challenge\|Yaml \|none \| 0\|acc \|0.5495\|± \|0.0145\|
	\| \| \|none \| 0\|acc_norm\|0.5794\|± \|0.0144\|
	\|arc_easy \|Yaml \|none \| 0\|acc \|0.8304\|± \|0.0077\|
	\| \| \|none \| 0\|acc_norm\|0.8068\|± \|0.0081\|
	\|boolq \|Yaml \|none \| 0\|acc \|0.8749\|± \|0.0058\|
	\|hellaswag \|Yaml \|none \| 0\|acc \|0.6276\|± \|0.0048\|
	\| \| \|none \| 0\|acc_norm\|0.8157\|± \|0.0039\|
	\|openbookqa \|Yaml \|none \| 0\|acc \|0.3180\|± \|0.0208\|
	\| \| \|none \| 0\|acc_norm\|0.4460\|± \|0.0223\|
	\|piqa \|Yaml \|none \| 0\|acc \|0.8139\|± \|0.0091\|
	\| \| \|none \| 0\|acc_norm\|0.8237\|± \|0.0089\|
	\|winogrande \|Yaml \|none \| 0\|acc \|0.7419\|± \|0.0123\|
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_macadeliccc__laser-polyglot-4x7b)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|65.79\|
	\|AI2 Reasoning Challenge (25-Shot)\|64.16\|
	\|HellaSwag (10-Shot) \|84.98\|
	\|MMLU (5-Shot) \|63.88\|
	\|TruthfulQA (0-shot) \|55.47\|
	\|Winogrande (5-shot) \|77.82\|
	\|GSM8k (5-shot) \|48.45\|