File size: 5,944 Bytes

---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
{{ card_data }}
---

# Model Card for mamba-2.8b-slimpj-OpenOrca_1ep

<!-- Provide a quick summary of what the model is/does. -->

This is a finetune of mamba-2.8b-slimpj for instruction following using the OpenOrca dataset.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->
This is a finetune of the mamba reference model mamba-2.8b-slimpj from the paper https://arxiv.org/abs/2312.00752

It has been fine-tuned for instruction following using the OpenOrca dataset and training for 1 epoch.

- **Model type:** Mamba State Space Model (mamba_ssm)
- **Finetuned from model:** https://huggingface.co/state-spaces/mamba-2.8b-slimpj


## Uses

This model is intended to evaluate fine-tuning results on mamba models.

## Usage

### Prompt structure

The prompt structure used in fine-tuning is alpaca format:

"### Human:\n%question%\n\n### AI response:\n%response%"

## Training Details

### Training Data

https://huggingface.co/datasets/Open-Orca/OpenOrca

### Training Procedure

Trained using text-generation-webui with code from the mamba_ssm pull request.


#### Training Hyperparameters

- **Training regime:** Trained in bfloat16 with the following parameters:

```
{
  "trained_model_name": "mamba-2.8b-slimpj-OpenOrc_1ep",
  "save_steps": 500000.0,
  "micro_batch_size": 4,
  "batch_size": 128,
  "epochs": 1.0,
  "learning_rate": "3e-4",
  "lr_scheduler_type": "linear",
  "cutoff_len": 256,
  "dataset": "OpenOrca",
  "eval_dataset": "None",
  "format": "openorca-format",
  "warmup_steps": 100.0,
  "optimizer": "paged_adamw_8bit",
  "hard_cut_string": "\\n\\n\\n",
  "add_eos_token": false,
  "min_chars": 0.0,
}

```
Reported train_loss was 0.6762700151924311

### Results

#### lm-evaluation-harness results for final model

mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
|    Tasks     |Version|Filter|n-shot|  Metric  | Value |   |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|arc_challenge |      1|none  |     0|acc       | 0.2594|±  |0.0128|
|              |       |none  |     0|acc_norm  | 0.2935|±  |0.0133|
|arc_easy      |      1|none  |     0|acc       | 0.4390|±  |0.0102|
|              |       |none  |     0|acc_norm  | 0.4032|±  |0.0101|
|boolq         |      2|none  |     0|acc       | 0.5801|±  |0.0086|
|lambada_openai|      1|none  |     0|perplexity|27.8582|±  |1.1183|
|              |       |none  |     0|acc       | 0.3683|±  |0.0067|
|openbookqa    |      1|none  |     0|acc       | 0.2500|±  |0.0194|
|              |       |none  |     0|acc_norm  | 0.3700|±  |0.0216|
|piqa          |      1|none  |     0|acc       | 0.6817|±  |0.0109|
|              |       |none  |     0|acc_norm  | 0.6839|±  |0.0108|
|winogrande    |      1|none  |     0|acc       | 0.5770|±  |0.0139|

#### lm-evaluation-harness results after half epoch

mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca_1ep-checkpoints/checkpoint-500000), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
|    Tasks     |Version|Filter|n-shot|  Metric  | Value |   |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|arc_challenge |      1|none  |     0|acc       | 0.2602|±  |0.0128|
|              |       |none  |     0|acc_norm  | 0.2833|±  |0.0132|
|arc_easy      |      1|none  |     0|acc       | 0.4533|±  |0.0102|
|              |       |none  |     0|acc_norm  | 0.4125|±  |0.0101|
|boolq         |      2|none  |     0|acc       | 0.4095|±  |0.0086|
|lambada_openai|      1|none  |     0|perplexity|30.4832|±  |1.2403|
|              |       |none  |     0|acc       | 0.3551|±  |0.0067|
|openbookqa    |      1|none  |     0|acc       | 0.2420|±  |0.0192|
|              |       |none  |     0|acc_norm  | 0.3640|±  |0.0215|
|piqa          |      1|none  |     0|acc       | 0.6812|±  |0.0109|
|              |       |none  |     0|acc_norm  | 0.6730|±  |0.0109|
|winogrande    |      1|none  |     0|acc       | 0.5588|±  |0.0140|

#### Reference lm-evaluation-harness results for the base model mamba-2.8b-slimpj without fine-tuning

mamba_ssm (pretrained=mamba-2.8b-slimpj), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
|    Tasks     |Version|Filter|n-shot|  Metric  |Value |   |Stderr|
|--------------|------:|------|-----:|----------|-----:|---|-----:|
|arc_challenge |      1|none  |     0|acc       |0.3882|±  |0.0142|
|              |       |none  |     0|acc_norm  |0.4155|±  |0.0144|
|arc_easy      |      1|none  |     0|acc       |0.7264|±  |0.0091|
|              |       |none  |     0|acc_norm  |0.6814|±  |0.0096|
|boolq         |      2|none  |     0|acc       |0.7107|±  |0.0079|
|lambada_openai|      1|none  |     0|perplexity|5.8770|±  |0.1881|
|              |       |none  |     0|acc       |0.6427|±  |0.0067|
|openbookqa    |      1|none  |     0|acc       |0.2860|±  |0.0202|
|              |       |none  |     0|acc_norm  |0.3980|±  |0.0219|
|piqa          |      1|none  |     0|acc       |0.7709|±  |0.0098|
|              |       |none  |     0|acc_norm  |0.7813|±  |0.0096|
|winogrande    |      1|none  |     0|acc       |0.6614|±  |0.0133|



#### Summary

The models measured perplexity and accuracy got worse, but it's known that that can be an effect of fine-tuning. Perplexity and accuracy improved in the second half of the training, so it's likely that the inital worsening was caused by forcing a prompt structure onto the base model, which was trained only on unstructured text.

The answer quality as percieved by users is yet to be evaluated.

## Environmental Impact

- **Hardware Type:** RTX 3090
- **Hours used:** 118