anakin87's picture
Update README.md
e5e5316 verified
metadata
base_model:
  - swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
  - DeepMount00/Llama-3-8b-Ita
library_name: transformers
tags:
  - mergekit
  - merge
license: llama3
language:
  - it

Llama-3-8b-ita-slerp

This is a merge of pre-trained language models created using mergekit.

I tried to merge two of the best Italian LLMs using Mergekit. The results are acceptable, but I could not improve on the best existing model.

Evaluation

For a detailed comparison of model performance, check out the Leaderboard for Italian Language Models.

Here's a breakdown of the performance metrics:

Metric hellaswag_it acc_norm arc_it acc_norm m_mmlu_it 5-shot acc Average
Accuracy Normalized 0.6879 0.5714 0.5732 0.6109

Merge Details

Merge Method

This model was merged using the SLERP merge method.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:


slices:
- sources:
  - model: swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
    layer_range:
    - 0
    - 32
  - model: DeepMount00/Llama-3-8b-Ita
    layer_range:
    - 0
    - 32
merge_method: slerp
base_model: swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
parameters:
  t:
  - filter: self_attn
    value:
    - 0
    - 0.5
    - 0.3
    - 0.7
    - 1
  - filter: mlp
    value:
    - 1
    - 0.5
    - 0.7
    - 0.3
    - 0
  - value: 0.5
dtype: bfloat16