Adding Evaluation Results

af2d236 verified 9 months ago

8.13 kB

	---
	language:
	- bg
	- ca
	- cs
	- da
	- de
	- en
	- es
	- fr
	- hr
	- hu
	- it
	- nl
	- pl
	- pt
	- ro
	- ru
	- sl
	- sr
	- sv
	- uk
	license: apache-2.0
	library_name: transformers
	datasets:
	- Open-Orca/OpenOrca
	- OpenAssistant/oasst_top1_2023-08-25
	model-index:
	- name: Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 60.49
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 82.07
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 62.34
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 46.38
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 78.45
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 40.18
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2
	name: Open LLM Leaderboard
	---


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/641b435ba5f876fe30c5ae0a/rJ1RxzuE-3gzgCppx-T8f.png)

	```
	reference-data-model:

	datasets:
	- OpenAssistant/oasst_top1_2023-08-25:
	lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk"
	link: https://huggingface.co/datasets/OpenAssistant/oasst_top1_2023-08-25

	model:
	- Open-Orca/Mistral-7B-OpenOrca
	Link:
	https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca

	100 examples of generating:
	- Link:
	https://huggingface.co/NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2/blob/main/output.xlsx

	Activated training with:
	- Link:
	https://huggingface.co/blog/tomaarsen/attention-sinks
	https://github.com/tomaarsen/attention_sinks
	https://arxiv.org/abs/2309.17453

	Version:
	- Link:
	https://huggingface.co/NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v1
	https://huggingface.co/NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v3

	Eval model:
	- link:
	https://huggingface.co/datasets/open-llm-leaderboard/details_NickyNicky__Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2

	```


	##


	```py
	# attention-sinks
	pip install attention_sinks

	# flash-attn
	!export CUDA_HOME=/usr/local/cuda-11.8
	!MAX_JOBS=4 pip install flash-attn --no-build-isolation -qqq
	!pip install git+"https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary" -qqq
	```


	## Version
	```py
	import torch, transformers,torchvision
	torch.__version__,transformers.__version__, torchvision.__version__
	#OUTPUTS: ('2.0.1+cu118', '4.34.0.dev0', '0.15.2+cu118')
	```

	## How to use
	```py

	from transformers import (
	AutoModelForCausalLM,
	AutoTokenizer,
	BitsAndBytesConfig,
	HfArgumentParser,
	TrainingArguments,
	pipeline,
	logging,
	GenerationConfig,
	TextIteratorStreamer,
	)

	from attention_sinks import AutoModelForCausalLM

	import torch

	# model_id = 'Open-Orca/Mistral-7B-OpenOrca'
	model_id='NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2'

	model = AutoModelForCausalLM.from_pretrained(model_id,
	device_map="auto",
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	load_in_4bit=True,
	low_cpu_mem_usage= True,

	attention_sink_size=4,
	attention_sink_window_size=1024, #512, # <- Low for the sake of faster generation
	)

	max_length=2048
	print("max_length",max_length)


	tokenizer = AutoTokenizer.from_pretrained(model_id,
	# use_fast = False,
	max_length=max_length,)

	tokenizer.pad_token = tokenizer.eos_token
	tokenizer.padding_side = 'right'

	#EXAMPLE #1
	txt="""<\|im_start\|>user
	I'm looking for an efficient Python script to output prime numbers. Can you help me out? I'm interested in a script that can handle large numbers and output them quickly. Also, it would be great if the script could take a range of numbers as input and output all the prime numbers within that range. Can you generate a script that fits these requirements? Thanks!<\|im_end\|>
	<\|im_start\|>assistant
	"""

	#EXAMPLE #2
	txt="""<\|im_start\|>user
	Estoy desarrollando una REST API con Nodejs, y estoy tratando de aplicar algún sistema de seguridad, ya sea con tokens o algo similar, me puedes ayudar?<\|im_end\|>
	<\|im_start\|>assistant
	"""

	inputs = tokenizer.encode(txt, return_tensors="pt").to("cuda")

	generation_config = GenerationConfig(
	max_new_tokens=max_new_tokens,
	temperature=0.7,
	top_p=0.9,
	top_k=len_tokens,
	repetition_penalty=1.11,
	do_sample=True,
	# pad_token_id=tokenizer.eos_token_id,
	# eos_token_id=tokenizer.eos_token_id,
	# use_cache=True,
	# stopping_criteria= StoppingCriteriaList([stopping_criteria]),
	)
	outputs = model.generate(generation_config=generation_config,
	input_ids=inputs,)
	tokenizer.decode(outputs[0], skip_special_tokens=False) #True
	```

	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_NickyNicky__Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|61.65\|
	\|AI2 Reasoning Challenge (25-Shot)\|60.49\|
	\|HellaSwag (10-Shot) \|82.07\|
	\|MMLU (5-Shot) \|62.34\|
	\|TruthfulQA (0-shot) \|46.38\|
	\|Winogrande (5-shot) \|78.45\|
	\|GSM8k (5-shot) \|40.18\|