Edit model card

Uploaded model

  • Finetuned from model : alnrg2arg/blockchainlabs_7B_merged_test2_4

This is a SFT version of the model from blockchainlab test 2.4 - alnrg2arg/blockchainlabs_7B_merged_test2_4.

The project is running to make a small LLM for a on-device purpose.

Overall pipeline for this iteration is

1.Merging to make a base model (7B) 2.Prune the model to reduce the parameter (50% sparcity) 3.For recovery phase of the pruning, the DPO is chosen.

This model which is not pruned is intended to compare with the pruned model.

DPO consists of two parts : SFT and DPO - Now this model is the intermediate format (SFT) This model can also be compared to the DPO version of the model.

This is the code and parameters I chose for this model(SFT).

from transformers import TrainingArguments
from trl import SFTTrainer
from datasets import load_dataset
from unsloth import FastLanguageModel, FastMistralModel


max_seq_length = 2048 # Supports automatic RoPE Scaling, so choose any number

# Load model
model, tokenizer = FastMistralModel.from_pretrained(
    model_name = "alnrg2arg/blockchainlabs_7B_merged_test2_4,
    max_seq_length = max_seq_length,
    dtype = None, # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
    load_in_4bit = True, # Use 4bit quantization to reduce memory usage. Can be False
    #device_map = "balanced"
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

model = FastMistralModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Dropout = 0 is currently optimized
    bias = "none",    # Bias = "none" is currently optimized
    use_gradient_checkpointing = True,
    random_state = 3407,
    max_seq_length = max_seq_length,
)

The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing

Benchmark scores

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 25 acc 0.7116 ± 0.0132
none 25 acc_norm 0.7346 ± 0.0129
Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 10 acc 0.7222 ± 0.0045
none 10 acc_norm 0.8865 ± 0.0032
Tasks Version Filter n-shot Metric Value Stderr
truthfulqa_mc2 2 none 0 acc 0.7043 ± 0.015
Groups Version Filter n-shot Metric Value Stderr
mmlu N/A none 0 acc 0.6367 ± 0.1258
- humanities N/A none 5 acc 0.5968 ± 0.1122
- other N/A none 5 acc 0.7049 ± 0.1123
- social_sciences N/A none 5 acc 0.7374 ± 0.0774
- stem N/A none 5 acc 0.5309 ± 0.1373
Tasks Version Filter n-shot Metric Value Stderr
winogrande 1 none 5 acc 0.8477 ± 0.0101
Tasks Version Filter n-shot Metric Value Stderr
gsm8k 2 get-answer 5 exact_match 0.7468 ± 0.012

Average 75.94

Downloads last month
9
Safetensors
Model size
7.24B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train alnrg2arg/test3_sft_16bit