Edit model card

KoDolph-2x8b

Update @ 2024.04.26: Linear Merge of Llama-3-Open-Ko-8B-Instruct-preview and dolphin-2.9-llama3-8b

Model Details

KoDolph-2x8b: I had this idea at night that it would make sense to make a Linear Merge

Model Merge: Linear Merge

Composition

  1. Base Layers from Llama-3-Open-Ko-8B-Instruct-preview:

    • Range: Layers 0 to 20
    • Purpose: These layers are utilized for their strong foundational language processing capabilities specifically in Korean. They are crucial for processing and understanding Korean text effectively, handling basic linguistic functions and intermediate language understanding.
  2. Advanced Layers from Dolphin-2.9-llama3-8b:

    • Range: Layers 15 to 24
    • Purpose: These layers provide advanced domain-specific capabilities, particularly suited for coding and technical tasks. Beginning integration from layer 15 enhances the model's ability to manage complex scenarios involving technical language and coding tasks.

Purpose and Utility:

This "Linear Merge" strategically combines the strengths of both models through weighted averaging, ensuring a balanced influence in the merged output. This approach is designed to provide robust performance in applications requiring a deep understanding and generation of Korean text, along with the capability to handle specialized tasks involving technical descriptions and coding. It is ideal for creating advanced AI assistants, coding bots, or any application where high linguistic and technical precision is needed.

Configuration

models:
  - model: beomi/Llama-3-Open-Ko-8B-Instruct-preview
    parameters:
      weight: 0.5  # Equal weight to maintain balance between foundational language processing and advanced technical tasks
    layer_range: [0, 20]  # Use foundational and intermediate language processing layers in Korean
  - model: cognitivecomputations/dolphin-2.9-llama3-8b
    parameters:
      weight: 0.5  # Equal weight to complement and balance the capabilities of the Llama model
    layer_range: [15, 24]  # Utilize advanced coding and domain-specific layers

merge_method: linear  # Balanced combination of layers using a weighted average
dtype: float16  # Efficient resource usage for computational performance

Test Result

Root Cause:

  • Bad Response: There were some strange answers, so I think there may have been a problem during the merge process. We are merging and investigating again as the instructions are not in the Korean version.

Screenshot-2024-04-27-at-12-25-46-PM

Downloads last month
1
Safetensors
Model size
8.03B params
Tensor type
FP16
·