File size: 6,894 Bytes
1ab68a9 e180b2f 1ab68a9 91735a0 663f724 1ab68a9 fecf524 1ab68a9 38bb2fc 1ab68a9 fecf524 1ab68a9 d938934 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
---
license: apache-2.0
---
<style>
table {
border-collapse: collapse;
width: 100%;
margin-bottom: 20px;
}
th, td {
border: 1px solid #ddd;
padding: 8px;
text-align: center;
}
.best {
font-weight: bold;
text-decoration: underline;
}
</style>
<div style="text-align: center; margin: 20px auto; padding: 20px; border: 3px solid #ddd; border-radius: 10px;">
<h2 style="margin-bottom: 4px; margin-top: 0px;">OuteAI</h2>
<a href="https://www.outeai.com/" target="_blank" style="margin-right: 10px;">๐ OuteAI.com</a>
<a href="https://discord.gg/vyBM87kAmf" target="_blank" style="margin-right: 10px;">๐ค Join our Discord</a>
<a href="https://x.com/OuteAI" target="_blank">๐ @OuteAI</a>
</div>
## Introduction
We're excited to introduce our latest model, the Lite Oute 2 Mamba2Attn 250M. <br>
This is our third generation model featuring the new Mamba2 architecture with attention layers. <br>
If you're interested in more technical details that covers the training process, architecture, and performance: <a href="https://outeai.com/blog/lite-oute-2-mamba2attn" target="_blank">Read the full blog post here</a> <br>
This is a base pre-trained model, not an instruction-tuned model for direct interaction. It is specifically designed as a starting point for further fine-tuning on specific tasks or downstream datasets. <br>
It serves as a foundation for developers and researchers to customize and optimize for their particular applications through additional training on task-specific data.
## Model Variants
- [Lite-Oute-2-Mamba2Attn-250M-Instruct](https://huggingface.co/OuteAI/Lite-Oute-2-Mamba2Attn-250M-Instruct)
- [Lite-Oute-2-Mamba2Attn-250M-Base](https://huggingface.co/OuteAI/Lite-Oute-2-Mamba2Attn-250M-Base)
## Training Details
The model was pre-trained on 30 billion tokens using a balanced mixture of datasets:
- **50% dclm-baseline-1.0**
- **50% fineweb-edu**
Base model training was conducted on single NVIDIA 4090 and NVIDIA H100 GPUs, with the following key parameters:
- **Max learning rate:** 4e-4
- **Min learning rate:** 1e-4
- **Block size:** 4096
- **Token batches:** ~100k tokens
## Benchmark Results
<table>
<tr>
<th>Benchmark</th>
<th>Lite-Oute-2-Mamba2Attn-250M-Base</th>
</tr>
<tr>
<td>ARC-C (0-shot)</td>
<td>26.88</td>
</tr>
<tr>
<td>ARC-E (0-shot)</td>
<td>53.54</td>
</tr>
<tr>
<td>HellaSWAG (0-shot)</td>
<td>38.00</td>
</tr>
<tr>
<td>MMLU (0-shot)</td>
<td>24.87</td>
</tr>
<tr>
<td>OpenBookQA (0-shot)</td>
<td>30.20</td>
</tr>
<tr>
<td>PIQA (0-shot)</td>
<td>66.27</td>
</tr>
<tr>
<td>Winogrande (0-shot)</td>
<td>52.01</td>
</tr>
<tr>
<td>ARC-C (5-shot)</td>
<td>27.22</td>
</tr>
<tr>
<td>ARC-E (5-shot)</td>
<td>55.51</td>
</tr>
<tr>
<td>HellaSWAG (5-shot)</td>
<td>38.17</td>
</tr>
<tr>
<td>MMLU (5-shot)</td>
<td>25.59</td>
</tr>
<tr>
<td>OpenBookQA (5-shot)</td>
<td>30.40</td>
</tr>
<tr>
<td>PIQA (5-shot)</td>
<td>66.59</td>
</tr>
<tr>
<td>Winogrande (5-shot)</td>
<td>52.49</td>
</tr>
</table>
## Let's look at some real world examples
### Input:
```
<s>Scientists have made a breakthrough in renewable energy by developing a new type of
```
### Output:
```
solar cell that can convert sunlight directly into electricity.
The researchers found that the material, called a carbon nanotube (CNT) solar cell, is able to capture and store light energy from the sun's rays and converts it into electricity using a process known as photoelectrochemistry.
This process involves passing electrons through a layer of semiconducting materials like silicon or germanium, which are then connected together via cables. As the electrons pass through these layers, they cause them to become excited and move towards anode, where they are oxidized and released their stored energy.
In this process, the CNT solar cells generate a tiny amount of power, but when the current flows through the circuit, it produces more than enough electricity to run electronic devices such as computers, lights, and other electronic devices.
```
## Usage with HuggingFace transformers
Model weights were converted from the original Mamba2 implementation to be Hugging Face compatible. <br>
Due to the lack of official support for Mamba2 attention layers in Hugging Face Transformers, custom modeling files are included. <br>
The implementation of Mamba2 with attention in the modeling files comes from Pull Request #32027 in the Hugging Face Transformers repository: [https://github.com/huggingface/transformers/pull/32027](https://github.com/huggingface/transformers/pull/32027)
To speed up inference, we recommend installing mamba-ssm and flash attention 2.
mamba-ssm:
```bash
pip install causal-conv1d>=1.4.0
pip install mamba-ssm
```
flash attention 2:
```bash
pip install flash-attn --no-build-isolation
```
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModelForCausalLM.from_pretrained(
"OuteAI/Lite-Oute-2-Mamba2Attn-Base",
# To allow custom modeling files
trust_remote_code=True,
# If you have installed flash attention 2
# attn_implementation="flash_attention_2",
# torch_dtype=torch.bfloat16,
)
model.to(device)
tokenizer = AutoTokenizer.from_pretrained("OuteAI/Lite-Oute-2-Mamba2Attn-Base")
def generate_response(message: str, temperature: float = 0.2, repetition_penalty: float = 1.12) -> str:
# Convert message to PyTorch tensors
input_ids = tokenizer.encode(
message, return_tensors="pt"
).to(device)
# Generate the response
output = model.generate(
input_ids,
max_length=256,
temperature=temperature,
repetition_penalty=repetition_penalty,
do_sample=True
)
# Decode the generated output
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
return generated_text
message = "Scientists have made a breakthrough in renewable energy by developing a new type of"
response = generate_response(message)
print(response)
```
## Disclaimer
By using this model, you acknowledge that you understand and assume the risks associated with its use.
You are solely responsible for ensuring compliance with all applicable laws and regulations.
We disclaim any liability for problems arising from the use of this open-source model, including but not limited to direct, indirect, incidental, consequential, or punitive damages. We make no warranties, express or implied, regarding the model's performance, accuracy, or fitness for a particular purpose.
Your use of this model is at your own risk, and you agree to hold harmless and indemnify us, our affiliates, and our contributors from any claims, damages, or expenses arising from your use of the model.
|