File size: 3,400 Bytes
2133956
7aacc36
 
 
 
 
 
 
ec4080b
 
 
2133956
 
7aacc36
2133956
286ff8d
7aacc36
 
2133956
 
 
 
 
 
 
ec4080b
2133956
 
 
 
 
 
ec4080b
2133956
7aacc36
2133956
 
 
 
 
 
 
7aacc36
 
 
 
 
 
 
2133956
 
7aacc36
2133956
 
 
 
 
7aacc36
2133956
 
 
ec4080b
 
2133956
 
 
 
 
 
 
 
 
 
 
02d8365
7aacc36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
02d8365
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
license: apache-2.0
datasets:
- togethercomputer/RedPajama-Data-V2
- stingning/ultrachat
language:
- fr
- en
metrics:
- accuracy
- perplexity
---

# Mambaoutai 1.6B

Mambaoutai is the result of all the experiments and training runs described in the [following blog post](https://www.lighton.ai/fr/blog/blog-4/passing-the-torch-training-a-mamba-model-for-smooth-handover-54), where all details about the model series is shared. Mambaoutai is series of small mamba checkpoints released for the community to explore, trained on French, English and code. We run two different decay phases with the WSD-scheduler, and release model checkpoints pretrained both with and without instruction data. 

## Usage

You need to install `transformers` from `main` until `transformers=4.39.0` is released.

```bash
pip install git+https://github.com/huggingface/transformers@main
```

We also recommend you to install both `causal-conv1d` and `mamba-ssm` using: 

```bash
pip install causal-conv1d>=1.2.0
pip install mamba-ssm
```

If any of these two is not installed, the "eager" implementation will be used(not recommended). Otherwise the more optimised `cuda` kernels will be used.

### Generation

Use this snippet of code to generate text from the model:

```python
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch

if model_has_instruct_data:
	# use chat tokens
	prompt = ”<start_user>Tell me something about Paris.<end_message><start_assistant>”
else:
	# prompt the non-instructed tuned model gently
	prompt = ”This is a text about Paris. Paris is”

tokenizer = AutoTokenizer.from_pretrained("lightonai/mambaoutai")
model = MambaForCausalLM.from_pretrained("lightonai/mambaoutai")
input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"]

out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))
```

### Training checkpoints

You can find some of the training checkpoints in the repo branch. On branch corresponding to the model at some point in time during training.

You can do inference with these training checkpoints by adding the `revision` parameter to the `from_pretrained` method. 
For example, to load the model checkpoint after 30000 steps of pretraining, you can use the following code:

```python
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("lightonai/mambaoutai", revision="pre-30000")
model = MambaForCausalLM.from_pretrained("lightonai/mambaoutai", revision="pre-30000")
input_ids = tokenizer("What is a mamba?", return_tensors="pt")["input_ids"]

out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))
```

### Model hyperparameters

More details about the model hyperparameters are given in the table below :

  | Parameter             | Value    |
  |-----------------------|----------|
  | d_model               | 2688     |
  | n_layer               | 28       |
  | vocab_size            | 65024    |
  | context_len           | 4096     |
  | rms_norm              | true     |
  | residual_in_fp32      | true     |
  | fused_add_norm        | true     |
  | conv_kernel           | 4        |
  | d_inner               | 5376     |
  | state_size            | 16       |
  | dtype                 | bfloat16 |
  | tie_word_embeddings   | false    |
  | non embeddings params | 1.27B    |