Adding Evaluation Results

ed1fff7 11 months ago

3.47 kB

	---
	license: llama2
	datasets:
	- Open-Orca/OpenOrca
	---
	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->

	在llama-2-13b上使用open orca前20萬筆資料集進行訓練

	# Fine-Tuning Information
	- GPU: RTX4090 (single core / 24564MiB)
	- model: meta-llama/Llama-2-13b-hf
	- dataset: Open-Orca/OpenOrca (取前20w筆訓練集)
	- peft_type: LoRA
	- lora_rank: 8
	- lora_target: q_proj, v_proj
	- per_device_train_batch_size: 8
	- gradient_accumulation_steps: 8
	- learning_rate : 5e-5
	- epoch: 1
	- precision: bf16
	- quantization: load_in_4bit

	# Fine-Tuning Detail
	- train_loss: 0.8616
	- train_runtime: 29:18:07 (use deepspeed)

	# Evaluation
	- 評估結果來自HuggingFaceH4/open_llm_leaderboard
	- 與Llama-2-13b和其他使用Open-Orca的模型比較4種Benchmark
	- Benchmark包含ARC、HellaSwag、MMLU、TruthfulQA

	\| Model \|Average\| ARC \|HellaSwag\| MMLU \| TruthfulQA \|
	\|-----------------------------------------\|-------\|-------\|---------\|-------\|------------\|
	\|meta-llama/Llama-2-13b-hf \| 56.9 \| 58.11 \| 80.97 \| 54.34 \| 34.17 \|
	\|meta-llama/Llama-2-13b-chat-hf \| 59.93 \| 59.04 \| 81.94 \| 54.64 \| 44.12 \|
	\|Open-Orca/OpenOrca-Platypus2-13B \| 64.6 \| 62.8 \| 83.15 \| 59.39 \| 53.08 \|
	\|Open-Orca/OpenOrcaxOpenChat-Preview2-13B \| 63.81 \| 62.37 \| 82.96 \| 58.68 \| 51.23 \|
	\|circulus/Llama-2-13b-orca-v1 \| 62.91 \| 62.03 \| 82.27 \| 57.71 \| 49.61 \|
	\|CHIH-HUNG/llama-2-13b-OpenOrca_5w \| 61.2 \| 61.01 \| 82.82 \| 56.09 \| 44.87 \|
	\|CHIH-HUNG/llama-2-13b-open_orca_20w \| 60.46 \| 59.9 \| 82.51 \| 56.3 \| 43.14 \|

	# How to convert dataset to json

	- 在load_dataset中輸入資料集名稱，並且在take中輸入要取前幾筆資料
	- 觀察該資料集的欄位名稱，填入example欄位中(例如system_prompt、question、response)
	- 最後指定json檔儲存位置 (json_filename)

	```py
	import json
	from datasets import load_dataset

	# 讀取數據集，take可以取得該數據集前n筆資料
	dataset = load_dataset("Open-Orca/OpenOrca", split="train", streaming=True).take(200000)

	# 提取所需欄位並建立新的字典列表
	extracted_data = []
	for example in dataset:
	extracted_example = {
	### open orca
	"system_prompt": example["system_prompt"],
	"question": example["question"],
	"response": example["response"]
	}
	extracted_data.append(extracted_example)

	# 指定 JSON 文件名稱
	json_filename = "open_orca.json"

	# 寫入 JSON 文件
	with open(json_filename, "w") as json_file:
	json.dump(extracted_data, json_file, indent=4)

	print(f"數據已提取並保存為 {json_filename}")
	```

	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_CHIH-HUNG__llama-2-13b-OpenOrca_20w)

	\| Metric \| Value \|
	\|-----------------------\|---------------------------\|
	\| Avg. \| 50.38 \|
	\| ARC (25-shot) \| 59.9 \|
	\| HellaSwag (10-shot) \| 82.51 \|
	\| MMLU (5-shot) \| 56.3 \|
	\| TruthfulQA (0-shot) \| 43.14 \|
	\| Winogrande (5-shot) \| 77.19 \|
	\| GSM8K (5-shot) \| 12.66 \|
	\| DROP (3-shot) \| 20.98 \|

	---
	license: llama2
	datasets:
	- Open-Orca/OpenOrca
	---
	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->

	在llama-2-13b上使用open orca前20萬筆資料集進行訓練

	# Fine-Tuning Information
	- GPU: RTX4090 (single core / 24564MiB)
	- model: meta-llama/Llama-2-13b-hf
	- dataset: Open-Orca/OpenOrca (取前20w筆訓練集)
	- peft_type: LoRA
	- lora_rank: 8
	- lora_target: q_proj, v_proj
	- per_device_train_batch_size: 8
	- gradient_accumulation_steps: 8
	- learning_rate : 5e-5
	- epoch: 1
	- precision: bf16
	- quantization: load_in_4bit

	# Fine-Tuning Detail
	- train_loss: 0.8616
	- train_runtime: 29:18:07 (use deepspeed)

	# Evaluation
	- 評估結果來自HuggingFaceH4/open_llm_leaderboard
	- 與Llama-2-13b和其他使用Open-Orca的模型比較4種Benchmark
	- Benchmark包含ARC、HellaSwag、MMLU、TruthfulQA

	\| Model \|Average\| ARC \|HellaSwag\| MMLU \| TruthfulQA \|
	\|-----------------------------------------\|-------\|-------\|---------\|-------\|------------\|
	\|meta-llama/Llama-2-13b-hf \| 56.9 \| 58.11 \| 80.97 \| 54.34 \| 34.17 \|
	\|meta-llama/Llama-2-13b-chat-hf \| 59.93 \| 59.04 \| 81.94 \| 54.64 \| 44.12 \|
	\|Open-Orca/OpenOrca-Platypus2-13B \| 64.6 \| 62.8 \| 83.15 \| 59.39 \| 53.08 \|
	\|Open-Orca/OpenOrcaxOpenChat-Preview2-13B \| 63.81 \| 62.37 \| 82.96 \| 58.68 \| 51.23 \|
	\|circulus/Llama-2-13b-orca-v1 \| 62.91 \| 62.03 \| 82.27 \| 57.71 \| 49.61 \|
	\|CHIH-HUNG/llama-2-13b-OpenOrca_5w \| 61.2 \| 61.01 \| 82.82 \| 56.09 \| 44.87 \|
	\|CHIH-HUNG/llama-2-13b-open_orca_20w \| 60.46 \| 59.9 \| 82.51 \| 56.3 \| 43.14 \|

	# How to convert dataset to json

	- 在load_dataset中輸入資料集名稱，並且在take中輸入要取前幾筆資料
	- 觀察該資料集的欄位名稱，填入example欄位中(例如system_prompt、question、response)
	- 最後指定json檔儲存位置 (json_filename)

	```py
	import json
	from datasets import load_dataset

	# 讀取數據集，take可以取得該數據集前n筆資料
	dataset = load_dataset("Open-Orca/OpenOrca", split="train", streaming=True).take(200000)

	# 提取所需欄位並建立新的字典列表
	extracted_data = []
	for example in dataset:
	extracted_example = {
	### open orca
	"system_prompt": example["system_prompt"],
	"question": example["question"],
	"response": example["response"]
	}
	extracted_data.append(extracted_example)

	# 指定 JSON 文件名稱
	json_filename = "open_orca.json"

	# 寫入 JSON 文件
	with open(json_filename, "w") as json_file:
	json.dump(extracted_data, json_file, indent=4)

	print(f"數據已提取並保存為 {json_filename}")
	```

	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_CHIH-HUNG__llama-2-13b-OpenOrca_20w)

	\| Metric \| Value \|
	\|-----------------------\|---------------------------\|
	\| Avg. \| 50.38 \|
	\| ARC (25-shot) \| 59.9 \|
	\| HellaSwag (10-shot) \| 82.51 \|
	\| MMLU (5-shot) \| 56.3 \|
	\| TruthfulQA (0-shot) \| 43.14 \|
	\| Winogrande (5-shot) \| 77.19 \|
	\| GSM8K (5-shot) \| 12.66 \|
	\| DROP (3-shot) \| 20.98 \|