moe-x33 / README.md

Adding Evaluation Results (#3)

e34d4b0 verified 3 months ago

No virus

5.89 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	tags:
	- code
	- moe
	datasets:
	- andersonbcdefg/synthetic_retrieval_tasks
	- ise-uiuc/Magicoder-Evol-Instruct-110K
	metrics:
	- code_eval
	model-index:
	- name: moe-x33
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 26.19
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=senseable/moe-x33
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 26.44
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=senseable/moe-x33
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 24.93
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=senseable/moe-x33
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 51.14
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=senseable/moe-x33
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 50.99
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=senseable/moe-x33
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 0.0
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=senseable/moe-x33
	name: Open LLM Leaderboard
	---


	# 33x Coding Model

	33x-coder is a powerful Llama based model available on Hugging Face, designed to assist and augment coding tasks. Leveraging the capabilities of advanced language models, 33x-coder specializes in understanding and generating code. This model is trained on a diverse range of programming languages and coding scenarios, making it a versatile tool for developers looking to streamline their coding process. Whether you're debugging, seeking coding advice, or generating entire scripts, 33x-coder can provide relevant, syntactically correct code snippets and comprehensive programming guidance. Its intuitive understanding of coding languages and constructs makes it an invaluable asset for any coding project, helping to reduce development time and improve code quality.

	## Importing necessary libraries from transformers
	```
	from transformers import AutoTokenizer, AutoModelForCausalLM
	```

	## Initialize the tokenizer and model
	```
	tokenizer = AutoTokenizer.from_pretrained("senseable/33x-coder")
	model = AutoModelForCausalLM.from_pretrained("senseable/33x-coder").cuda()
	```

	# User's request for a quick sort algorithm in Python
	```
	messages = [
	{'role': 'user', 'content': "Write a Python function to check if a number is prime."}
	]
	```

	## Preparing the input for the model by encoding the messages and sending them to the same device as the model
	```
	inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
	```

	## Generating responses from the model with specific parameters for text generation
	```
	outputs = model.generate(
	inputs,
	max_new_tokens=512, # Maximum number of new tokens to generate
	do_sample=False, # Disable random sampling to get the most likely next token
	top_k=50, # The number of highest probability vocabulary tokens to keep for top-k-filtering
	top_p=0.95, # Nucleus sampling: keeps the top p probability mass worth of tokens
	num_return_sequences=1, # The number of independently computed returned sequences for each element in the batch
	eos_token_id=32021, # End of sequence token id
	add_generation_prompt=True
	)
	```

	## Decoding and printing the generated response

	```
	start_index = len(inputs[0])
	generated_output_tokens = outputs[0][start_index:]
	decoded_output = tokenizer.decode(generated_output_tokens, skip_special_tokens=True)
	print("Generated Code:\n", decoded_output)
	```

	---
	license: apache-2.0
	---
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_senseable__moe-x33)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|29.95\|
	\|AI2 Reasoning Challenge (25-Shot)\|26.19\|
	\|HellaSwag (10-Shot) \|26.44\|
	\|MMLU (5-Shot) \|24.93\|
	\|TruthfulQA (0-shot) \|51.14\|
	\|Winogrande (5-shot) \|50.99\|
	\|GSM8k (5-shot) \| 0.00\|