File size: 6,894 Bytes
1ab68a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e180b2f
 
1ab68a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91735a0
 
663f724
1ab68a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fecf524
1ab68a9
 
 
 
38bb2fc
 
1ab68a9
 
fecf524
1ab68a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d938934
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
---
license: apache-2.0
---
<style>
table {
    border-collapse: collapse;
    width: 100%;
    margin-bottom: 20px;
}
th, td {
    border: 1px solid #ddd;
    padding: 8px;
    text-align: center;
}
.best {
    font-weight: bold;
    text-decoration: underline;
}
</style>

<div style="text-align: center; margin: 20px auto; padding: 20px; border: 3px solid #ddd; border-radius: 10px;">
  <h2 style="margin-bottom: 4px; margin-top: 0px;">OuteAI</h2>
  <a href="https://www.outeai.com/" target="_blank" style="margin-right: 10px;">๐ŸŒŽ OuteAI.com</a> 
  <a href="https://discord.gg/vyBM87kAmf" target="_blank" style="margin-right: 10px;">๐Ÿค Join our Discord</a>
  <a href="https://x.com/OuteAI" target="_blank">๐• @OuteAI</a>
</div>

## Introduction
We're excited to introduce our latest model, the Lite Oute 2 Mamba2Attn 250M. <br>
This is our third generation model featuring the new Mamba2 architecture with attention layers. <br>
If you're interested in more technical details that covers the training process, architecture, and performance: <a href="https://outeai.com/blog/lite-oute-2-mamba2attn" target="_blank">Read the full blog post here</a> <br>
This is a base pre-trained model, not an instruction-tuned model for direct interaction. It is specifically designed as a starting point for further fine-tuning on specific tasks or downstream datasets. <br>
It serves as a foundation for developers and researchers to customize and optimize for their particular applications through additional training on task-specific data.

## Model Variants
- [Lite-Oute-2-Mamba2Attn-250M-Instruct](https://huggingface.co/OuteAI/Lite-Oute-2-Mamba2Attn-250M-Instruct)
- [Lite-Oute-2-Mamba2Attn-250M-Base](https://huggingface.co/OuteAI/Lite-Oute-2-Mamba2Attn-250M-Base)

## Training Details
The model was pre-trained on 30 billion tokens using a balanced mixture of datasets:
- **50% dclm-baseline-1.0**
- **50% fineweb-edu**

Base model training was conducted on single NVIDIA 4090 and NVIDIA H100 GPUs, with the following key parameters:
- **Max learning rate:** 4e-4
- **Min learning rate:** 1e-4
- **Block size:** 4096
- **Token batches:** ~100k tokens

## Benchmark Results
<table>
<tr>
    <th>Benchmark</th>
    <th>Lite-Oute-2-Mamba2Attn-250M-Base</th>
</tr>
<tr>
    <td>ARC-C (0-shot)</td>
    <td>26.88</td>
</tr>
<tr>
    <td>ARC-E (0-shot)</td>
    <td>53.54</td>
</tr>
<tr>
    <td>HellaSWAG (0-shot)</td>
    <td>38.00</td>
</tr>
<tr>
    <td>MMLU (0-shot)</td>
    <td>24.87</td>
</tr>
<tr>
    <td>OpenBookQA (0-shot)</td>
    <td>30.20</td>
</tr>
<tr>
    <td>PIQA (0-shot)</td>
    <td>66.27</td>
</tr>
<tr>
    <td>Winogrande (0-shot)</td>
    <td>52.01</td>
</tr>

<tr>
    <td>ARC-C (5-shot)</td>
    <td>27.22</td>
</tr>
<tr>
    <td>ARC-E (5-shot)</td>
    <td>55.51</td>
</tr>
<tr>
    <td>HellaSWAG (5-shot)</td>
    <td>38.17</td>
</tr>
<tr>
    <td>MMLU (5-shot)</td>
    <td>25.59</td>
</tr>
<tr>
    <td>OpenBookQA (5-shot)</td>
    <td>30.40</td>
</tr>
<tr>
    <td>PIQA (5-shot)</td>
    <td>66.59</td>
</tr>
<tr>
    <td>Winogrande (5-shot)</td>
    <td>52.49</td>
</tr>
</table>

## Let's look at some real world examples

### Input:
```
<s>Scientists have made a breakthrough in renewable energy by developing a new type of
```
### Output:
```
solar cell that can convert sunlight directly into electricity.
The researchers found that the material, called a carbon nanotube (CNT) solar cell, is able to capture and store light energy from the sun's rays and converts it into electricity using a process known as photoelectrochemistry.
This process involves passing electrons through a layer of semiconducting materials like silicon or germanium, which are then connected together via cables. As the electrons pass through these layers, they cause them to become excited and move towards anode, where they are oxidized and released their stored energy.
In this process, the CNT solar cells generate a tiny amount of power, but when the current flows through the circuit, it produces more than enough electricity to run electronic devices such as computers, lights, and other electronic devices.
```

## Usage with HuggingFace transformers 
Model weights were converted from the original Mamba2 implementation to be Hugging Face compatible. <br>
Due to the lack of official support for Mamba2 attention layers in Hugging Face Transformers, custom modeling files are included. <br>
The implementation of Mamba2 with attention in the modeling files comes from Pull Request #32027 in the Hugging Face Transformers repository: [https://github.com/huggingface/transformers/pull/32027](https://github.com/huggingface/transformers/pull/32027)

To speed up inference, we recommend installing mamba-ssm and flash attention 2.

mamba-ssm:
```bash
pip install causal-conv1d>=1.4.0
pip install mamba-ssm
```

flash attention 2:
```bash
pip install flash-attn --no-build-isolation
```

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModelForCausalLM.from_pretrained(
    "OuteAI/Lite-Oute-2-Mamba2Attn-Base",
    # To allow custom modeling files
    trust_remote_code=True,

    # If you have installed flash attention 2
    # attn_implementation="flash_attention_2",
    # torch_dtype=torch.bfloat16,
)
model.to(device)
tokenizer = AutoTokenizer.from_pretrained("OuteAI/Lite-Oute-2-Mamba2Attn-Base")

def generate_response(message: str, temperature: float = 0.2, repetition_penalty: float = 1.12) -> str:
    # Convert message to PyTorch tensors
    input_ids = tokenizer.encode(
        message, return_tensors="pt"
    ).to(device)
    # Generate the response
    output = model.generate(
        input_ids,
        max_length=256,
        temperature=temperature,
        repetition_penalty=repetition_penalty,
        do_sample=True
    ) 
    # Decode the generated output
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    return generated_text
message = "Scientists have made a breakthrough in renewable energy by developing a new type of"
response = generate_response(message)
print(response)
```

## Disclaimer
By using this model, you acknowledge that you understand and assume the risks associated with its use. 
You are solely responsible for ensuring compliance with all applicable laws and regulations. 
We disclaim any liability for problems arising from the use of this open-source model, including but not limited to direct, indirect, incidental, consequential, or punitive damages. We make no warranties, express or implied, regarding the model's performance, accuracy, or fitness for a particular purpose. 
Your use of this model is at your own risk, and you agree to hold harmless and indemnify us, our affiliates, and our contributors from any claims, damages, or expenses arising from your use of the model.