CultriX-MoE-BF16 / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
2ea1627 verified
|
raw
history blame
15.1 kB
metadata
license: apache-2.0
tags:
  - moe
  - frankenmoe
  - merge
  - mergekit
  - lazymergekit
  - mlabonne/NeuralBeagle14-7B
  - fblgit/UNA-dolphin-2.6-mistral-7b-dpo-laser
  - mlabonne/Marcoro14-7B-slerp
base_model:
  - mlabonne/NeuralBeagle14-7B
  - fblgit/UNA-dolphin-2.6-mistral-7b-dpo-laser
  - mlabonne/Marcoro14-7B-slerp
model-index:
  - name: CultriX-MoE-BF16
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 68.94
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CultriX/CultriX-MoE-BF16
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 86.96
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CultriX/CultriX-MoE-BF16
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 65.2
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CultriX/CultriX-MoE-BF16
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 63.47
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CultriX/CultriX-MoE-BF16
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 81.06
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CultriX/CultriX-MoE-BF16
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 69.98
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CultriX/CultriX-MoE-BF16
          name: Open LLM Leaderboard

CultriX-MoE-BF16

CultriX-MoE-BF16 is a Mixure of Experts (MoE) made with the following models using LazyMergekit:

🧩 Configuration

base_model: "EmbeddedLLM/Mistral-7B-Merge-14-v0.2"
gate_mode: hidden
dtype: bfloat16
experts:
  - source_model: "mlabonne/NeuralBeagle14-7B"
    positive_prompts:
      - "Create a story based on"
      - "Debate the topic of"
      - "Come up with some arguments"
      - "Provide me with instructions on"
      - "Interpret the sentiment"
      - "Interpret and execute these cooking instructions"
      - "Craft a persuasive argument"
      - "Analyze the motivations"
      - "Construct a detailed plan for"
      - "Narrate an event from multiple perspectives."
      - "Formulate a response"
      - "Write a script for a short play"
      - "Generate a sequence of instructions to teach a skill."
      - "Solve this riddle"
      - "Create an engaging story"
      - "Write a fictional"
      - "Propose a solution to a social issue"
      - "Develop a dialogue"
      - "Create a step-by-step guide"
      - "Devise a strategy"
      - "Write a narrative"
      - "Tell me how to"
      - "Explain the concept of"
      - "Give an overview of"
      - "Compare and contrast between"
      - "Provide information about"
      - "Help me understand"
      - "Summarize"
      - "Make a recommendation on"
      - "Answer this question"
      - "How do you approach"
      - "Explain the concept of"
      - "Give an overview of"
      - "Provide information about"
      - "Help me understand the principles of"
      - "Summarize the key components of"
      - "Make a recommendation on how to"
      - "Answer this question:"
    negative_prompts:
      - "Provide in-depth information about quantum computing."
      - "Explain the inner workings of an internal combustion engine."
      - "Give a detailed tutorial on advanced calculus."
      - "Summarize the latest research in genetic engineering."
      - "Interpret financial markets and stock trends."
      - "Analyze the chemical composition of"
      - "Develop a blueprint for."
      - "Offer a critique of a modern art piece."
      - "Provide a technical review of"
      - "Conduct a linguistic analysis of an ancient language."
      - "Write a user manual for advanced medical equipment."
      - "Give a step-by-step guide on piloting an aircraft."
      - "Conduct an in-depth analysis of this code"
      - "Explain the physics behind black holes."
      - "Provide a strategy for managing a cyber attack"
      - "Develop an algorithm for predictive analytics in finance."
      - "Provide information about advanced programming algorithms."
      - "Help me understand the details of this code"
      - "Summarize the process of cellular respiration."
      - "Improve the security of"
      - "What are the latest advancements in artificial intelligence?"
      - "Provide detailed technical coding solutions."
      - "Analyze complex scientific data and statistics."
      - "Offer medical diagnoses based on symptoms."
      - "Conduct a detailed financial audit of a company."
      - "Perform real-time translation of multiple languages."
      - "Create high-resolution graphic designs."
      - "Develop complex mathematical proofs."
      - "Offer legal advice on specific cases."
      - "Write a detailed manual on advanced mechanical engineering."
      - "Conduct an in-depth psychological assessment."
      - "Perform a security analysis of a computer network."
      - "Compose an original piece of music."
      - "Plan and execute a scientific experiment."
      - "Provide professional career counseling."
      - "Develop a complex database management system."
      - "Write a software program for data analysis."
      - "Give expert advice on cyber"
      - "Conduct a pentesting security audit"
  - source_model: "fblgit/UNA-dolphin-2.6-mistral-7b-dpo-laser"
    positive_prompts:
      - "Provide step-by-step coding instructions for..."
      - "Draft a function with detailed steps in [language]"
      - "Guide me through coding a simple [type of application or script]"
      - "Recommend best practices for code implementation in [context]"
      - "Generate a regex pattern for extracting [specific data]"
      - "Create a regex for matching [pattern]"
      - "Explain the purpose of this regex pattern"
      - "Compose regex for [specific use case]"
      - "Annotate this code with detailed comments for each line"
      - "Add explanatory comments to this script"
      - "Comment on each part of this code for clarity"
      - "Develop a script to [accomplish task]"
      - "Design a database schema for [specific use case]"
      - "Outline secure methods for [specific operation]"
      - "Guide on optimizing [specific aspect] in this code"
      - "Refactor this code for better readability and efficiency"
      - "Compare and contrast these code snippets"
      - "Identify the programming language of this snippet"
      - "Demonstrate the usage of [specific tool/library/API]"
      - "Show implementation steps for this [feature/concept]"
      - "Teach how to use [specific tool/library/framework]"
      - "Generate a README file for this project"
      - "Create a manual page for [specific tool/command]"
      - "Produce comprehensive documentation for this code"
      - "Build detailed documentation for [specific module]"
      - "Explain the underlying concept of this code snippet"
      - "Propose enhancements for this script"
      - "Suggest improvements for this API call integration"
      - "Diagnose and solve this coding issue"
      - "Demonstrate robust error handling in this code"
      - "Debug and resolve issues in this script"
      - "Design a user-friendly GUI for this script's functionality"
      - "Detail the deployment process for this application"
      - "Deploy an app designed to [perform function]"
      - "Set up a web service for [specific purpose]"
      - "Develop a website with [specific features]"
      - "Craft a webpage showcasing [specific content]"
      - "Illustrate data flow in this code architecture"
      - "Convert this code from [language A] to [language B]"
      - "Translate this script into [different programming language]"
      - "Explain resource management techniques in [context]"
      - "Build a basic API endpoint for [functionality]"
      - "Strategies to enhance scalability in [context]"
      - "Conduct a security review for this code"
      - "Enhance security measures in [application/module]"
      - "Set up a development environment for [language/framework]"
      - "Visualize data from [specific dataset]"
      - "Generate a dataset for [specific use case]"
      - "Scripting guide for automating [task/process]"
      - "Utilize this code for [specific purpose]"
      - "Principles of object-oriented programming in [language]"
      - "Create a mobile-responsive layout for this web app"
      - "Explain the debugging process for this code"
      - "Compose code to accomplish [task]"
      - "Guidance on writing code for [specific purpose]"
      - "I need a script for [specific function]"
      - "Clarify the functionality of this code"
      - "What is the purpose of this code segment?"
      - "Enhance this code for [specific improvement]"
      - "Develop a program that [solves problem]"
      - "Code needed for [specific task]"
      - "Program a solution for [problem statement]"
      - "Enhance this function's performance by..."
      - "Refactor code for better readability in [context]"
      - "Craft a custom function for [specific requirement]"
      - "Reduce computational complexity in this algorithm by..."
      - "Extend the codebase to include [new feature]"
      - "Incorporate this API into an existing application"
      - "Assist in troubleshooting and bug fixing for [issue]"
      - "Review and prep this code for deployment"
      - "Analyze error logs for potential issues in [context]"
      - "Create unit tests for [module/component]"
      - "Evaluate methodologies for [problem-solving]"
      - "Research [topic] online"
      - "Utilize the [plugin/tool] to achieve [result]"
      - "Design an efficient search algorithm for [data type]"
      - "Create a web crawler for [specific data extraction]"
      - "Application of web sockets in [real-time scenario]"
      - "Guide to integrating a third-party library in [framework]"
      - "Best practices in API design for [application type]"
    negative_prompts:
      - "Provide a detailed analysis of historical events."
      - "Give medical advice for treating a specific illness."
      - "Write a comprehensive review of a novel."
      - "Explain legal implications of a contract."
      - "Develop a marketing strategy for a new product."
      - "Offer financial advice for stock investments."
      - "Create a recipe for a gourmet dish."
      - "Teach a foreign language lesson."
      - "Compose a symphony or musical piece."
      - "Provide workout plans and fitness coaching."
      - "Conduct a psychological analysis of a character."
      - "Write a script for a movie or play."
      - "Design a blueprint for architectural structures."
      - "Give a tutorial on how to paint a landscape."
      - "Explain quantum physics theories."
      - "Offer career counseling and resume writing tips."
      - "Teach how to repair a car engine."
      - "Plan a travel itinerary for a world tour."
      - "Guide on how to grow organic vegetables."
      - "Discuss political strategies for an election campaign."
  - source_model: "mlabonne/Marcoro14-7B-slerp"
    positive_prompts:
      - "Generate a creative story based on these keywords."
      - "Explain a complex topic in simple terms"
      - "Provide a detailed summary of"
      - "Answer this question with factual accuracy"
      - "Explain the historical significance of"
      - "Provide a truthful and detailed account of"
      - "Develop a strategy for solving a practical problem."
      - "Explain the reasoning behind"
      - "Provide an analysis of a moral dilemma with possible solutions."
    negative_prompts:
      - "imathematical problem-solving."
      - "scientific theory explanations."
      - "high-level abstract reasoning tasks."
      - "professional advice in specialized fields like law or medicine."
      - "provide me with a coding solution for"
      - "Academic research"

💻 Usage

!pip install -qU transformers bitsandbytes accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "CultriX/CultriX-MoE-BF16"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)

messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 72.60
AI2 Reasoning Challenge (25-Shot) 68.94
HellaSwag (10-Shot) 86.96
MMLU (5-Shot) 65.20
TruthfulQA (0-shot) 63.47
Winogrande (5-shot) 81.06
GSM8k (5-shot) 69.98