ohallstrom
commited on
Commit
•
7aacc36
1
Parent(s):
2133956
First draft of readme
Browse files
README.md
CHANGED
@@ -1,11 +1,18 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
-
#
|
7 |
|
8 |
-
#
|
|
|
|
|
9 |
|
10 |
You need to install `transformers` from `main` until `transformers=4.39.0` is released.
|
11 |
|
@@ -22,7 +29,7 @@ pip install mamba-ssm
|
|
22 |
|
23 |
If any of these two is not installed, the "eager" implementation will be used. Otherwise the more optimised `cuda` kernels will be used.
|
24 |
|
25 |
-
|
26 |
|
27 |
Use this snippet of code to generate text from the model:
|
28 |
|
@@ -30,15 +37,22 @@ Use this snippet of code to generate text from the model:
|
|
30 |
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
|
31 |
import torch
|
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
tokenizer = AutoTokenizer.from_pretrained("lightonai/mambaoutai")
|
34 |
model = MambaForCausalLM.from_pretrained("lightonai/mambaoutai")
|
35 |
-
input_ids = tokenizer(
|
36 |
|
37 |
out = model.generate(input_ids, max_new_tokens=10)
|
38 |
print(tokenizer.batch_decode(out))
|
39 |
```
|
40 |
|
41 |
-
|
42 |
|
43 |
You can find some of the training checkpoints in the repo branch. On branch corresponding to the model at some point in time during training.
|
44 |
|
@@ -54,4 +68,24 @@ input_ids = tokenizer("What is a mamba?", return_tensors="pt")["input_ids"]
|
|
54 |
|
55 |
out = model.generate(input_ids, max_new_tokens=10)
|
56 |
print(tokenizer.batch_decode(out))
|
57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- togethercomputer/RedPajama-Data-V2
|
5 |
+
- stingning/ultrachat
|
6 |
+
language:
|
7 |
+
- fr
|
8 |
+
- en
|
9 |
---
|
10 |
|
11 |
+
# Mambaoutai 1.6B
|
12 |
|
13 |
+
Mambaoutai is the result of all the experiments and training runs described in the following blog post # ADD LINK, where all details about the model series is shared. Mambaoutai is series of small mamba checkpoints released for the community to explore, trained on French, English and code. We run two different decay phases with the WSD-scheduler, and release model checkpoints pretrained both with and without instruction data.
|
14 |
+
|
15 |
+
## Usage
|
16 |
|
17 |
You need to install `transformers` from `main` until `transformers=4.39.0` is released.
|
18 |
|
|
|
29 |
|
30 |
If any of these two is not installed, the "eager" implementation will be used. Otherwise the more optimised `cuda` kernels will be used.
|
31 |
|
32 |
+
### Generation
|
33 |
|
34 |
Use this snippet of code to generate text from the model:
|
35 |
|
|
|
37 |
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
|
38 |
import torch
|
39 |
|
40 |
+
if model_has_instruct_data:
|
41 |
+
# use chat tokens
|
42 |
+
prompt = ”<start_user>Tell me something about Paris.<end_message><start_assistant>”
|
43 |
+
else:
|
44 |
+
# prompt the non-instructed tuned model gently
|
45 |
+
prompt = ”This is a text about Paris. Paris is”
|
46 |
+
|
47 |
tokenizer = AutoTokenizer.from_pretrained("lightonai/mambaoutai")
|
48 |
model = MambaForCausalLM.from_pretrained("lightonai/mambaoutai")
|
49 |
+
input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"]
|
50 |
|
51 |
out = model.generate(input_ids, max_new_tokens=10)
|
52 |
print(tokenizer.batch_decode(out))
|
53 |
```
|
54 |
|
55 |
+
### Training checkpoints
|
56 |
|
57 |
You can find some of the training checkpoints in the repo branch. On branch corresponding to the model at some point in time during training.
|
58 |
|
|
|
68 |
|
69 |
out = model.generate(input_ids, max_new_tokens=10)
|
70 |
print(tokenizer.batch_decode(out))
|
71 |
+
|
72 |
+
### Model hyperparameters
|
73 |
+
|
74 |
+
More details about the model hyperparameters are given in the table below :
|
75 |
+
|
76 |
+
| Parameter | Value |
|
77 |
+
|-----------------------|----------|
|
78 |
+
| d_model | 2688 |
|
79 |
+
| n_layer | 28 |
|
80 |
+
| vocab_size | 65024 |
|
81 |
+
| context_len | 4096 |
|
82 |
+
| rms_norm | true |
|
83 |
+
| residual_in_fp32 | true |
|
84 |
+
| fused_add_norm | true |
|
85 |
+
| conv_kernel | 4 |
|
86 |
+
| d_inner | 5376 |
|
87 |
+
| state_size | 16 |
|
88 |
+
| dtype | bfloat16 |
|
89 |
+
| tie_word_embeddings | false |
|
90 |
+
| non embeddings params | 1.27B |
|
91 |
+
```
|