Aura 16x17B SparseMoE

Model Card for Aura 16x17B SparseMoE

Model Summary

Aura 16x17B SparseMoE is a research release of a 17 billion parameter brazillian model.

Developed by: Orion Research

Model: Aura 16x17B SparseMoE
Model Size: 17 billion parameters
Context length: 8192 (8k tokens)

This model uses orion archteture because of our proprietary Q* training algorithm implementation, but you can replace the archteture by MistralForCausalLM/mistral or even LlamaForCausalLM/llama, the model will also performs very well, because Q* is used only during training.

Use

# pip install transformers
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "orion-research/Aura-16x17B-SparseMoE"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

# Format message with the Aura 16x17B SparseMoE chat template
messages = [{"role": "user", "content": "Olá! Como vai você hoje?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")

gen_tokens = model.generate(
  input_ids, 
  max_new_tokens=4096, 
  do_sample=True, 
  temperature=0.4,
)

gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)

Model Details

Input: text only.

Output: text only.

Model Architecture: This is an auto-regressive language model that uses an optimized transformer architecture and the base model is based on Llama 3 8B. After pretraining, this model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety (ORPO).

Languages covered: The model is optimized to perform well in the following languages: English and Brazilian Portuguese

Context length: The supports a context length of 8k tokens.

Limitations: This model is not fully trained, this is a checkpoint from converging moment, feel free to run a continued trained to your own preferenced tasks.

Training Details

Training Factors We used custom training libraries, Orion Research Mini Cluster for pretraining. Carbon Footprint Pretraining utilized a cumulative 4.800 GPU hours of computation on hardware of type RTX-4090-24GB (TDP of 500W) and RTX-3090-24GB (TDP of 500W). Total emissions were ZERO tCO2eq, 100% of which were offset by Orion’s sustainability program.

	Time (GPU hours)	Power Consumption (W)	Carbon Emitted(tCO2eq)
Aura Llama 3 8B base model	900	500	0 ZERO
Aura Q* Scaling to 17B	900	500	0 ZERO
Aura Final 16x17B SparseMoE	3K	500	0 ZERO
Total	4.8K	1500	0 ZERO

We Are Green

We emit ZER0 CO2: Orion Research has it's own Solar Usine based on Conceição do Coité - BA - Brazil

Paper

...comming soon...

Training Code

...comming soon maybe...

Model Card Contact

For additional questions about details in this model card, contact kayky@orion.moe.