Edit model card

Model Card for mamba-2.8b-slimpj-OpenOrca_1ep

This is a finetune of mamba-2.8b-slimpj for instruction following using the OpenOrca dataset.

Model Details

Model Description

This is a finetune of the mamba reference model mamba-2.8b-slimpj from the paper https://arxiv.org/abs/2312.00752

It has been fine-tuned for instruction following using the OpenOrca dataset and training for 1 epoch.

Uses

This model is intended to evaluate fine-tuning results on mamba models.

Usage

Prompt structure

The prompt structure used in fine-tuning is alpaca format:

"### Human:\n%question%\n\n### AI response:\n%response%"

Training Details

Training Data

https://huggingface.co/datasets/Open-Orca/OpenOrca

Training Procedure

Trained using text-generation-webui with code from the mamba_ssm pull request.

Training Hyperparameters

  • Training regime: Trained in bfloat16 with the following parameters:
{
  "trained_model_name": "mamba-2.8b-slimpj-OpenOrc_1ep",
  "save_steps": 500000.0,
  "micro_batch_size": 4,
  "batch_size": 128,
  "epochs": 1.0,
  "learning_rate": "3e-4",
  "lr_scheduler_type": "linear",
  "cutoff_len": 256,
  "dataset": "OpenOrca",
  "eval_dataset": "None",
  "format": "openorca-format",
  "warmup_steps": 100.0,
  "optimizer": "paged_adamw_8bit",
  "hard_cut_string": "\\n\\n\\n",
  "add_eos_token": false,
  "min_chars": 0.0,
}

Reported train_loss was 0.6762700151924311

Results

lm-evaluation-harness results for final model

mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.2594 ± 0.0128
none 0 acc_norm 0.2935 ± 0.0133
arc_easy 1 none 0 acc 0.4390 ± 0.0102
none 0 acc_norm 0.4032 ± 0.0101
boolq 2 none 0 acc 0.5801 ± 0.0086
lambada_openai 1 none 0 perplexity 27.8582 ± 1.1183
none 0 acc 0.3683 ± 0.0067
openbookqa 1 none 0 acc 0.2500 ± 0.0194
none 0 acc_norm 0.3700 ± 0.0216
piqa 1 none 0 acc 0.6817 ± 0.0109
none 0 acc_norm 0.6839 ± 0.0108
winogrande 1 none 0 acc 0.5770 ± 0.0139

lm-evaluation-harness results after half epoch

mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca_1ep-checkpoints/checkpoint-500000), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.2602 ± 0.0128
none 0 acc_norm 0.2833 ± 0.0132
arc_easy 1 none 0 acc 0.4533 ± 0.0102
none 0 acc_norm 0.4125 ± 0.0101
boolq 2 none 0 acc 0.4095 ± 0.0086
lambada_openai 1 none 0 perplexity 30.4832 ± 1.2403
none 0 acc 0.3551 ± 0.0067
openbookqa 1 none 0 acc 0.2420 ± 0.0192
none 0 acc_norm 0.3640 ± 0.0215
piqa 1 none 0 acc 0.6812 ± 0.0109
none 0 acc_norm 0.6730 ± 0.0109
winogrande 1 none 0 acc 0.5588 ± 0.0140

Reference lm-evaluation-harness results for the base model mamba-2.8b-slimpj without fine-tuning

mamba_ssm (pretrained=mamba-2.8b-slimpj), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.3882 ± 0.0142
none 0 acc_norm 0.4155 ± 0.0144
arc_easy 1 none 0 acc 0.7264 ± 0.0091
none 0 acc_norm 0.6814 ± 0.0096
boolq 2 none 0 acc 0.7107 ± 0.0079
lambada_openai 1 none 0 perplexity 5.8770 ± 0.1881
none 0 acc 0.6427 ± 0.0067
openbookqa 1 none 0 acc 0.2860 ± 0.0202
none 0 acc_norm 0.3980 ± 0.0219
piqa 1 none 0 acc 0.7709 ± 0.0098
none 0 acc_norm 0.7813 ± 0.0096
winogrande 1 none 0 acc 0.6614 ± 0.0133

Summary

The models measured perplexity and accuracy got worse, but it's known that that can be an effect of fine-tuning. Perplexity and accuracy improved in the second half of the training, so it's likely that the inital worsening was caused by forcing a prompt structure onto the base model, which was trained only on unstructured text.

The answer quality as percieved by users is yet to be evaluated.

Environmental Impact

  • Hardware Type: RTX 3090
  • Hours used: 118
Downloads last month
5