File size: 6,894 Bytes

---
license: apache-2.0
---
<style>
table {
    border-collapse: collapse;
    width: 100%;
    margin-bottom: 20px;
}
th, td {
    border: 1px solid #ddd;
    padding: 8px;
    text-align: center;
}
.best {
    font-weight: bold;
    text-decoration: underline;
}
</style>

<div style="text-align: center; margin: 20px auto; padding: 20px; border: 3px solid #ddd; border-radius: 10px;">
  <h2 style="margin-bottom: 4px; margin-top: 0px;">OuteAI</h2>
  <a href="https://www.outeai.com/" target="_blank" style="margin-right: 10px;">🌎 OuteAI.com</a> 
  <a href="https://discord.gg/vyBM87kAmf" target="_blank" style="margin-right: 10px;">🤝 Join our Discord</a>
  <a href="https://x.com/OuteAI" target="_blank">𝕏 @OuteAI</a>
</div>

## Introduction
We're excited to introduce our latest model, the Lite Oute 2 Mamba2Attn 250M. <br>
This is our third generation model featuring the new Mamba2 architecture with attention layers. <br>
If you're interested in more technical details that covers the training process, architecture, and performance: <a href="https://outeai.com/blog/lite-oute-2-mamba2attn" target="_blank">Read the full blog post here</a> <br>
This is a base pre-trained model, not an instruction-tuned model for direct interaction. It is specifically designed as a starting point for further fine-tuning on specific tasks or downstream datasets. <br>
It serves as a foundation for developers and researchers to customize and optimize for their particular applications through additional training on task-specific data.

## Model Variants
- [Lite-Oute-2-Mamba2Attn-250M-Instruct](https://huggingface.co/OuteAI/Lite-Oute-2-Mamba2Attn-250M-Instruct)
- [Lite-Oute-2-Mamba2Attn-250M-Base](https://huggingface.co/OuteAI/Lite-Oute-2-Mamba2Attn-250M-Base)

## Training Details
The model was pre-trained on 30 billion tokens using a balanced mixture of datasets:
- **50% dclm-baseline-1.0**
- **50% fineweb-edu**

Base model training was conducted on single NVIDIA 4090 and NVIDIA H100 GPUs, with the following key parameters:
- **Max learning rate:** 4e-4
- **Min learning rate:** 1e-4
- **Block size:** 4096
- **Token batches:** ~100k tokens

## Benchmark Results
<table>
<tr>
    <th>Benchmark</th>
    <th>Lite-Oute-2-Mamba2Attn-250M-Base</th>
</tr>
<tr>
    <td>ARC-C (0-shot)</td>
    <td>26.88</td>
</tr>
<tr>
    <td>ARC-E (0-shot)</td>
    <td>53.54</td>
</tr>
<tr>
    <td>HellaSWAG (0-shot)</td>
    <td>38.00</td>
</tr>
<tr>
    <td>MMLU (0-shot)</td>
    <td>24.87</td>
</tr>
<tr>
    <td>OpenBookQA (0-shot)</td>
    <td>30.20</td>
</tr>
<tr>
    <td>PIQA (0-shot)</td>
    <td>66.27</td>
</tr>
<tr>
    <td>Winogrande (0-shot)</td>
    <td>52.01</td>
</tr>

<tr>
    <td>ARC-C (5-shot)</td>
    <td>27.22</td>
</tr>
<tr>
    <td>ARC-E (5-shot)</td>
    <td>55.51</td>
</tr>
<tr>
    <td>HellaSWAG (5-shot)</td>
    <td>38.17</td>
</tr>
<tr>
    <td>MMLU (5-shot)</td>
    <td>25.59</td>
</tr>
<tr>
    <td>OpenBookQA (5-shot)</td>
    <td>30.40</td>
</tr>
<tr>
    <td>PIQA (5-shot)</td>
    <td>66.59</td>
</tr>
<tr>
    <td>Winogrande (5-shot)</td>
    <td>52.49</td>
</tr>
</table>

## Let's look at some real world examples

### Input:
```
<s>Scientists have made a breakthrough in renewable energy by developing a new type of
```
### Output:
```
solar cell that can convert sunlight directly into electricity.
The researchers found that the material, called a carbon nanotube (CNT) solar cell, is able to capture and store light energy from the sun's rays and converts it into electricity using a process known as photoelectrochemistry.
This process involves passing electrons through a layer of semiconducting materials like silicon or germanium, which are then connected together via cables. As the electrons pass through these layers, they cause them to become excited and move towards anode, where they are oxidized and released their stored energy.
In this process, the CNT solar cells generate a tiny amount of power, but when the current flows through the circuit, it produces more than enough electricity to run electronic devices such as computers, lights, and other electronic devices.
```

## Usage with HuggingFace transformers 
Model weights were converted from the original Mamba2 implementation to be Hugging Face compatible. <br>
Due to the lack of official support for Mamba2 attention layers in Hugging Face Transformers, custom modeling files are included. <br>
The implementation of Mamba2 with attention in the modeling files comes from Pull Request #32027 in the Hugging Face Transformers repository: [https://github.com/huggingface/transformers/pull/32027](https://github.com/huggingface/transformers/pull/32027)

To speed up inference, we recommend installing mamba-ssm and flash attention 2.

mamba-ssm:
```bash
pip install causal-conv1d>=1.4.0
pip install mamba-ssm
```

flash attention 2:
```bash
pip install flash-attn --no-build-isolation
```

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModelForCausalLM.from_pretrained(
    "OuteAI/Lite-Oute-2-Mamba2Attn-Base",
    # To allow custom modeling files
    trust_remote_code=True,

    # If you have installed flash attention 2
    # attn_implementation="flash_attention_2",
    # torch_dtype=torch.bfloat16,
)
model.to(device)
tokenizer = AutoTokenizer.from_pretrained("OuteAI/Lite-Oute-2-Mamba2Attn-Base")

def generate_response(message: str, temperature: float = 0.2, repetition_penalty: float = 1.12) -> str:
    # Convert message to PyTorch tensors
    input_ids = tokenizer.encode(
        message, return_tensors="pt"
    ).to(device)
    # Generate the response
    output = model.generate(
        input_ids,
        max_length=256,
        temperature=temperature,
        repetition_penalty=repetition_penalty,
        do_sample=True
    ) 
    # Decode the generated output
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    return generated_text
message = "Scientists have made a breakthrough in renewable energy by developing a new type of"
response = generate_response(message)
print(response)
```

## Disclaimer
By using this model, you acknowledge that you understand and assume the risks associated with its use. 
You are solely responsible for ensuring compliance with all applicable laws and regulations. 
We disclaim any liability for problems arising from the use of this open-source model, including but not limited to direct, indirect, incidental, consequential, or punitive damages. We make no warranties, express or implied, regarding the model's performance, accuracy, or fitness for a particular purpose. 
Your use of this model is at your own risk, and you agree to hold harmless and indemnify us, our affiliates, and our contributors from any claims, damages, or expenses arising from your use of the model.