'[object Object]': null
Model Card for mamba-2.8b-slimpj-OpenOrca_1ep
This is a finetune of mamba-2.8b-slimpj for instruction following using the OpenOrca dataset.
Model Details
Model Description
This is a finetune of the mamba reference model mamba-2.8b-slimpj from the paper https://arxiv.org/abs/2312.00752
It has been fine-tuned for instruction following using the OpenOrca dataset and training for 1 epoch.
- Model type: Mamba State Space Model (mamba_ssm)
- Finetuned from model: https://huggingface.co/state-spaces/mamba-2.8b-slimpj
Uses
This model is intended to evaluate fine-tuning results on mamba models.
Training Details
Training Data
https://huggingface.co/datasets/Open-Orca/OpenOrca
Training Procedure
Trained using text-generation-webui with code from the mamba_ssm pull request.
Training Hyperparameters
- Training regime: Trained in bfloat16 with the following parameters:
{
"trained_model_name": "mamba-2.8b-slimpj-OpenOrc_1ep",
"save_steps": 500000.0,
"micro_batch_size": 4,
"batch_size": 128,
"epochs": 1.0,
"learning_rate": "3e-4",
"lr_scheduler_type": "linear",
"cutoff_len": 256,
"dataset": "OpenOrca",
"eval_dataset": "None",
"format": "openorca-format",
"warmup_steps": 100.0,
"optimizer": "paged_adamw_8bit",
"hard_cut_string": "\\n\\n\\n",
"add_eos_token": false,
"min_chars": 0.0,
}
Reported train_loss was 0.6762700151924311
Results
lm-evaluation-harness results for final model
mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
arc_challenge | 1 | none | 0 | acc | 0.2594 | ± | 0.0128 |
none | 0 | acc_norm | 0.2935 | ± | 0.0133 | ||
arc_easy | 1 | none | 0 | acc | 0.4390 | ± | 0.0102 |
none | 0 | acc_norm | 0.4032 | ± | 0.0101 | ||
boolq | 2 | none | 0 | acc | 0.5801 | ± | 0.0086 |
lambada_openai | 1 | none | 0 | perplexity | 27.8582 | ± | 1.1183 |
none | 0 | acc | 0.3683 | ± | 0.0067 | ||
openbookqa | 1 | none | 0 | acc | 0.2500 | ± | 0.0194 |
none | 0 | acc_norm | 0.3700 | ± | 0.0216 | ||
piqa | 1 | none | 0 | acc | 0.6817 | ± | 0.0109 |
none | 0 | acc_norm | 0.6839 | ± | 0.0108 | ||
winogrande | 1 | none | 0 | acc | 0.5770 | ± | 0.0139 |
lm-evaluation-harness results after half epoch
mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca_1ep-checkpoints/checkpoint-500000), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
arc_challenge | 1 | none | 0 | acc | 0.2602 | ± | 0.0128 |
none | 0 | acc_norm | 0.2833 | ± | 0.0132 | ||
arc_easy | 1 | none | 0 | acc | 0.4533 | ± | 0.0102 |
none | 0 | acc_norm | 0.4125 | ± | 0.0101 | ||
boolq | 2 | none | 0 | acc | 0.4095 | ± | 0.0086 |
lambada_openai | 1 | none | 0 | perplexity | 30.4832 | ± | 1.2403 |
none | 0 | acc | 0.3551 | ± | 0.0067 | ||
openbookqa | 1 | none | 0 | acc | 0.2420 | ± | 0.0192 |
none | 0 | acc_norm | 0.3640 | ± | 0.0215 | ||
piqa | 1 | none | 0 | acc | 0.6812 | ± | 0.0109 |
none | 0 | acc_norm | 0.6730 | ± | 0.0109 | ||
winogrande | 1 | none | 0 | acc | 0.5588 | ± | 0.0140 |
Reference lm-evaluation-harness results for the base model mamba-2.8b-slimpj without fine-tuning
mamba_ssm (pretrained=mamba-2.8b-slimpj), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
arc_challenge | 1 | none | 0 | acc | 0.3882 | ± | 0.0142 |
none | 0 | acc_norm | 0.4155 | ± | 0.0144 | ||
arc_easy | 1 | none | 0 | acc | 0.7264 | ± | 0.0091 |
none | 0 | acc_norm | 0.6814 | ± | 0.0096 | ||
boolq | 2 | none | 0 | acc | 0.7107 | ± | 0.0079 |
lambada_openai | 1 | none | 0 | perplexity | 5.8770 | ± | 0.1881 |
none | 0 | acc | 0.6427 | ± | 0.0067 | ||
openbookqa | 1 | none | 0 | acc | 0.2860 | ± | 0.0202 |
none | 0 | acc_norm | 0.3980 | ± | 0.0219 | ||
piqa | 1 | none | 0 | acc | 0.7709 | ± | 0.0098 |
none | 0 | acc_norm | 0.7813 | ± | 0.0096 | ||
winogrande | 1 | none | 0 | acc | 0.6614 | ± | 0.0133 |
Summary
The models measured perplexity and accuracy got worse, but it's known that that can be an effect of fine-tuning. Perplexity and accuracy improved in the second half of the training, so it's likely that the inital worsening was caused by forcing a prompt structure onto the base model, which was trained only on unstructured text.
The answer quality as percieved by users is yet to be evaluated.
Environmental Impact
- Hardware Type: RTX 3090
- Hours used: 118