neuronovo-9B-v0.2 / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
958fe86 verified
|
raw
history blame
6.92 kB
metadata
language:
  - en
license: apache-2.0
library_name: transformers
model-index:
  - name: neuronovo-7B-v0.2
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 73.04
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Neuronovo/neuronovo-7B-v0.2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 88.32
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Neuronovo/neuronovo-7B-v0.2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 65.15
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Neuronovo/neuronovo-7B-v0.2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 71.02
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Neuronovo/neuronovo-7B-v0.2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 80.66
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Neuronovo/neuronovo-7B-v0.2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 62.47
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Neuronovo/neuronovo-7B-v0.2
          name: Open LLM Leaderboard

Currently 2nd best model in ~7B category (actually closer to ~9B) on Hugging Face Leaderboard!

More information about making the model available here: ๐Ÿ”—Don't stop DPOptimizing!

Author: Jan Kocoล„     ๐Ÿ”—LinkedIn     ๐Ÿ”—Google Scholar     ๐Ÿ”—ResearchGate

The "Neuronovo/neuronovo-9B-v0.2" model represents an advanced and fine-tuned version of a large language model, initially based on "CultriX/MistralTrix-v1." Several key characteristics and features of this model:

  1. Training Dataset: The model is trained on a dataset named "Intel/orca_dpo_pairs," likely specialized for dialogue and interaction scenarios. This dataset is formatted to differentiate between system messages, user queries, chosen and rejected answers, indicating a focus on natural language understanding and response generation in conversational contexts.

  2. Tokenizer and Formatting: It uses a tokenizer from the "CultriX/MistralTrix-v1" model, configured to pad tokens from the left and use the end-of-sequence token as the padding token. This suggests a focus on language generation tasks, particularly in dialogue systems.

  3. Low-Rank Adaptation (LoRA) Configuration: The model incorporates a LoRA configuration with specific parameters like r=16, lora_alpha=16, and lora_dropout of 0.05. This is indicative of a fine-tuning process that aims to efficiently adapt the model to specific tasks by modifying only a small subset of the model's weights.

  4. Model Specifications for Fine-Tuning: The model is fine-tuned using a custom setup, including a DPO (Data Parallel Optimization) Trainer. This highlights an emphasis on efficient training, possibly to optimize memory usage and computational resources, especially given the large scale of the model.

  5. Training Arguments and Strategies: The training process uses specific strategies like gradient checkpointing, gradient accumulation, and a cosine learning rate scheduler. These methods are typically employed in training large models to manage resource utilization effectively.

  6. Performance and Output Capabilities: Configured for causal language modeling, the model is capable of handling tasks that involve generating text or continuing dialogues, with a maximum prompt length of 1024 tokens and a maximum generation length of 1536 tokens. This suggests its aptitude for extended dialogues and complex language generation scenarios.

  7. Special Features and Efficiency: The use of techniques like LoRA, DPO training, and specific fine-tuning methods indicates that the "Neuronovo/neuronovo-9B-v0.2" model is not only powerful in terms of language generation but also optimized for efficiency, particularly in terms of computational resource management.

In summary, "Neuronovo/neuronovo-9B-v0.2" is a highly specialized, efficient, and capable large language model, fine-tuned for complex language generation tasks in conversational AI, leveraging state-of-the-art techniques in model adaptation and efficient training methodologies.

image/png


license: apache-2.0 language: - en library_name: transformers

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 73.44
AI2 Reasoning Challenge (25-Shot) 73.04
HellaSwag (10-Shot) 88.32
MMLU (5-Shot) 65.15
TruthfulQA (0-shot) 71.02
Winogrande (5-shot) 80.66
GSM8k (5-shot) 62.47