Adding Evaluation Results

1b0d220 verified 5 months ago

14.1 kB

	---
	license: apache-2.0
	tags:
	- moe
	- frankenmoe
	- merge
	- mergekit
	- lazymergekit
	- mlabonne/NeuralBeagle14-7B
	- fblgit/UNA-dolphin-2.6-mistral-7b-dpo-laser
	base_model:
	- mlabonne/NeuralBeagle14-7B
	- fblgit/UNA-dolphin-2.6-mistral-7b-dpo-laser
	model-index:
	- name: CultriX-MoE-Model
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 70.05
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CultriX/CultriX-MoE-Model
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 87.22
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CultriX/CultriX-MoE-Model
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 64.95
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CultriX/CultriX-MoE-Model
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 68.04
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CultriX/CultriX-MoE-Model
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 80.9
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CultriX/CultriX-MoE-Model
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 62.09
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CultriX/CultriX-MoE-Model
	name: Open LLM Leaderboard
	---

	# CultriX-MoE-Model

	CultriX-MoE-Model is a Mixure of Experts (MoE) made with the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
	* [mlabonne/NeuralBeagle14-7B](https://huggingface.co/mlabonne/NeuralBeagle14-7B)
	* [fblgit/UNA-dolphin-2.6-mistral-7b-dpo-laser](https://huggingface.co/fblgit/UNA-dolphin-2.6-mistral-7b-dpo-laser)

	## 🧩 Configuration

	```yaml
	base_model: "mlabonne/Marcoro14-7B-slerp"
	gate_mode: hidden
	dtype: bfloat16
	experts:
	- source_model: "mlabonne/NeuralBeagle14-7B"
	positive_prompts:
	- "Create a story based on"
	- "Debate the topic of"
	- "Come up with some arguments"
	- "Provide me with instructions on"
	- "Interpret the sentiment"
	- "Interpret and execute these cooking instructions"
	- "Craft a persuasive argument"
	- "Analyze the motivations"
	- "Construct a detailed plan for"
	- "Narrate an event from multiple perspectives."
	- "Formulate a response"
	- "Write a script for a short play"
	- "Generate a sequence of instructions to teach a skill."
	- "Solve this riddle"
	- "Create an engaging story"
	- "Write a fictional"
	- "Propose a solution to a social issue"
	- "Develop a dialogue"
	- "Create a step-by-step guide"
	- "Devise a strategy"
	- "Write a narrative"
	- "Tell me how to"
	- "Explain the concept of"
	- "Give an overview of"
	- "Compare and contrast between"
	- "Provide information about"
	- "Help me understand"
	- "Summarize"
	- "Make a recommendation on"
	- "Answer this question"
	- "How do you approach"
	- "Explain the concept of"
	- "Give an overview of"
	- "Provide information about"
	- "Help me understand the principles of"
	- "Summarize the key components of"
	- "Make a recommendation on how to"
	- "Answer this question:"
	negative_prompts:
	- "Provide in-depth information about quantum computing."
	- "Explain the inner workings of an internal combustion engine."
	- "Give a detailed tutorial on advanced calculus."
	- "Summarize the latest research in genetic engineering."
	- "Interpret financial markets and stock trends."
	- "Analyze the chemical composition of"
	- "Develop a blueprint for."
	- "Offer a critique of a modern art piece."
	- "Provide a technical review of"
	- "Conduct a linguistic analysis of an ancient language."
	- "Write a user manual for advanced medical equipment."
	- "Give a step-by-step guide on piloting an aircraft."
	- "Conduct an in-depth analysis of this code"
	- "Explain the physics behind black holes."
	- "Provide a strategy for managing a cyber attack"
	- "Develop an algorithm for predictive analytics in finance."
	- "Provide information about advanced programming algorithms."
	- "Help me understand the details of this code"
	- "Summarize the process of cellular respiration."
	- "Improve the security of"
	- "What are the latest advancements in artificial intelligence?"
	- "Provide detailed technical coding solutions."
	- "Analyze complex scientific data and statistics."
	- "Offer medical diagnoses based on symptoms."
	- "Conduct a detailed financial audit of a company."
	- "Perform real-time translation of multiple languages."
	- "Create high-resolution graphic designs."
	- "Develop complex mathematical proofs."
	- "Offer legal advice on specific cases."
	- "Write a detailed manual on advanced mechanical engineering."
	- "Conduct an in-depth psychological assessment."
	- "Perform a security analysis of a computer network."
	- "Compose an original piece of music."
	- "Plan and execute a scientific experiment."
	- "Provide professional career counseling."
	- "Develop a complex database management system."
	- "Write a software program for data analysis."
	- "Give expert advice on cyber"
	- "Conduct a pentesting security audit"
	- source_model: "fblgit/UNA-dolphin-2.6-mistral-7b-dpo-laser"
	positive_prompts:
	- "Provide step-by-step coding instructions for..."
	- "Draft a function with detailed steps in [language]"
	- "Guide me through coding a simple [type of application or script]"
	- "Recommend best practices for code implementation in [context]"
	- "Generate a regex pattern for extracting [specific data]"
	- "Create a regex for matching [pattern]"
	- "Explain the purpose of this regex pattern"
	- "Compose regex for [specific use case]"
	- "Annotate this code with detailed comments for each line"
	- "Add explanatory comments to this script"
	- "Comment on each part of this code for clarity"
	- "Develop a script to [accomplish task]"
	- "Design a database schema for [specific use case]"
	- "Outline secure methods for [specific operation]"
	- "Guide on optimizing [specific aspect] in this code"
	- "Refactor this code for better readability and efficiency"
	- "Compare and contrast these code snippets"
	- "Identify the programming language of this snippet"
	- "Demonstrate the usage of [specific tool/library/API]"
	- "Show implementation steps for this [feature/concept]"
	- "Teach how to use [specific tool/library/framework]"
	- "Generate a README file for this project"
	- "Create a manual page for [specific tool/command]"
	- "Produce comprehensive documentation for this code"
	- "Build detailed documentation for [specific module]"
	- "Explain the underlying concept of this code snippet"
	- "Propose enhancements for this script"
	- "Suggest improvements for this API call integration"
	- "Diagnose and solve this coding issue"
	- "Demonstrate robust error handling in this code"
	- "Debug and resolve issues in this script"
	- "Design a user-friendly GUI for this script's functionality"
	- "Detail the deployment process for this application"
	- "Deploy an app designed to [perform function]"
	- "Set up a web service for [specific purpose]"
	- "Develop a website with [specific features]"
	- "Craft a webpage showcasing [specific content]"
	- "Illustrate data flow in this code architecture"
	- "Convert this code from [language A] to [language B]"
	- "Translate this script into [different programming language]"
	- "Explain resource management techniques in [context]"
	- "Build a basic API endpoint for [functionality]"
	- "Strategies to enhance scalability in [context]"
	- "Conduct a security review for this code"
	- "Enhance security measures in [application/module]"
	- "Set up a development environment for [language/framework]"
	- "Visualize data from [specific dataset]"
	- "Generate a dataset for [specific use case]"
	- "Scripting guide for automating [task/process]"
	- "Utilize this code for [specific purpose]"
	- "Principles of object-oriented programming in [language]"
	- "Create a mobile-responsive layout for this web app"
	- "Explain the debugging process for this code"
	- "Compose code to accomplish [task]"
	- "Guidance on writing code for [specific purpose]"
	- "I need a script for [specific function]"
	- "Clarify the functionality of this code"
	- "What is the purpose of this code segment?"
	- "Enhance this code for [specific improvement]"
	- "Develop a program that [solves problem]"
	- "Code needed for [specific task]"
	- "Program a solution for [problem statement]"
	- "Enhance this function's performance by..."
	- "Refactor code for better readability in [context]"
	- "Craft a custom function for [specific requirement]"
	- "Reduce computational complexity in this algorithm by..."
	- "Extend the codebase to include [new feature]"
	- "Incorporate this API into an existing application"
	- "Assist in troubleshooting and bug fixing for [issue]"
	- "Review and prep this code for deployment"
	- "Analyze error logs for potential issues in [context]"
	- "Create unit tests for [module/component]"
	- "Evaluate methodologies for [problem-solving]"
	- "Research [topic] online"
	- "Utilize the [plugin/tool] to achieve [result]"
	- "Design an efficient search algorithm for [data type]"
	- "Create a web crawler for [specific data extraction]"
	- "Application of web sockets in [real-time scenario]"
	- "Guide to integrating a third-party library in [framework]"
	- "Best practices in API design for [application type]"
	negative_prompts:
	- "Provide a detailed analysis of historical events."
	- "Give medical advice for treating a specific illness."
	- "Write a comprehensive review of a novel."
	- "Explain legal implications of a contract."
	- "Develop a marketing strategy for a new product."
	- "Offer financial advice for stock investments."
	- "Create a recipe for a gourmet dish."
	- "Teach a foreign language lesson."
	- "Compose a symphony or musical piece."
	- "Provide workout plans and fitness coaching."
	- "Conduct a psychological analysis of a character."
	- "Write a script for a movie or play."
	- "Design a blueprint for architectural structures."
	- "Give a tutorial on how to paint a landscape."
	- "Explain quantum physics theories."
	- "Offer career counseling and resume writing tips."
	- "Teach how to repair a car engine."
	- "Plan a travel itinerary for a world tour."
	- "Guide on how to grow organic vegetables."
	- "Discuss political strategies for an election campaign."
	```

	## 💻 Usage

	```python
	!pip install -qU transformers bitsandbytes accelerate

	from transformers import AutoTokenizer
	import transformers
	import torch

	model = "CultriX/CultriX-MoE-Model"

	tokenizer = AutoTokenizer.from_pretrained(model)
	pipeline = transformers.pipeline(
	"text-generation",
	model=model,
	model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
	)

	messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
	prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
	print(outputs[0]["generated_text"])
	```
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_CultriX__CultriX-MoE-Model)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|72.21\|
	\|AI2 Reasoning Challenge (25-Shot)\|70.05\|
	\|HellaSwag (10-Shot) \|87.22\|
	\|MMLU (5-Shot) \|64.95\|
	\|TruthfulQA (0-shot) \|68.04\|
	\|Winogrande (5-shot) \|80.90\|
	\|GSM8k (5-shot) \|62.09\|