---
language:
- en
license: llama3
tags:
- axolotl
base_model: meta-llama/Meta-Llama-3-8B
datasets:
- BEE-spoke-data/KI-smorgasbord_fw-small
pipeline_tag: text-generation
model-index:
- name: Llama-3-6.3b-v0.1
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 10.44
      name: strict accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 18.68
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 1.51
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 4.47
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 6.15
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 20.44
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1
      name: Open LLM Leaderboard
---


# Llama-3-6.3b-v0.1

This is a layer pruning experiment based off of the original llama-3-8b:

- 8 layers pruned with [PruneMe](https://github.com/pszemraj/PruneMe/tree/upgrades)/MergeKit
  - layers selected using [BEE-spoke-data/fineweb-100k_en-med](https://hf.co/datasets/BEE-spoke-data/fineweb-100k_en-med)
- brief subsequent continued pretraining @ ctx 4096
  - data: 10k rows of FineWeb (different than pruning data) + some curated data
- wandb [here](https://wandb.ai/pszemraj/llama3-pruning)

## quick eval


hf (pretrained=pszemraj/Llama-3-6.3b-v0.1,trust_remote_code=True,dtype=bfloat16), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1

|    Tasks     |Version|Filter|n-shot|  Metric  |Value |   |Stderr|
|--------------|------:|------|-----:|----------|-----:|---|-----:|
|arc_easy      |      1|none  |     0|acc       |0.7109|±  |0.0093|
|              |       |none  |     0|acc_norm  |0.6843|±  |0.0095|
|boolq         |      2|none  |     0|acc       |0.7920|±  |0.0071|
|lambada_openai|      1|none  |     0|perplexity|4.5411|±  |0.1073|
|              |       |none  |     0|acc       |0.6734|±  |0.0065|
|openbookqa    |      1|none  |     0|acc       |0.3000|±  |0.0205|
|              |       |none  |     0|acc_norm  |0.4140|±  |0.0220|
|piqa          |      1|none  |     0|acc       |0.7443|±  |0.0102|
|              |       |none  |     0|acc_norm  |0.7530|±  |0.0101|
|winogrande    |      1|none  |     0|acc       |0.7127|±  |0.0127|


## Details

[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
<details><summary>See axolotl config</summary>

axolotl version: `0.4.0`
```yaml
base_model: pszemraj/llama-3-prune_8
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

strict: false
seed: 80085

# dataset
datasets:
    - path: BEE-spoke-data/KI-smorgasbord_fw-small
      type: completion # format from earlier
      field: text # Optional[str] default: text, field to use for completion data
val_set_size: 0.015

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: false
train_on_inputs: false
group_by_length: false

# WANDB
wandb_project: llama3-pruning
wandb_entity: pszemraj
wandb_watch: gradients
wandb_name: Llama-3-6.3b-v0.1
hub_model_id: pszemraj/Llama-3-6.3b-v0.1
hub_strategy: every_save

gradient_accumulation_steps: 16
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_torch_fused # paged_adamw_32bit
weight_decay: 0.05
lr_scheduler: cosine
learning_rate: 4e-5
warmup_ratio: 0.1

load_in_8bit: false
load_in_4bit: false
bfloat16: true
tf32: true

flash_attention: true
torch_compile: true # requires >= torch 2.0, may sometimes cause problems
torch_compile_backend: inductor # Optional[str]
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false

# hyperparams for freq of evals, saving, etc
evals_per_epoch: 5
saves_per_epoch: 3
save_safetensors: true
save_total_limit: 1
output_dir: ./output-axolotl/output-model-6.3b
logging_steps: 8

deepspeed:

special_tokens:
  pad_token: <|end_of_text|>

```

</details><br>

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| No log        | 0.0006 | 1    | 7.8100          |
| 2.2782        | 0.2002 | 320  | 2.3728          |
| 2.2699        | 0.4004 | 640  | 2.3265          |
| 2.3761        | 0.6006 | 960  | 2.2849          |
| 2.2448        | 0.8008 | 1280 | 2.2702          |

---
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_pszemraj__Llama-3-6.3b-v0.1)

|      Metric       |Value|
|-------------------|----:|
|Avg.               |10.28|
|IFEval (0-Shot)    |10.44|
|BBH (3-Shot)       |18.68|
|MATH Lvl 5 (4-Shot)| 1.51|
|GPQA (0-shot)      | 4.47|
|MuSR (0-shot)      | 6.15|
|MMLU-PRO (5-shot)  |20.44|