Model Card for Aura 16x17B SparseMoE
Model Summary
Aura 16x17B SparseMoE is a research release of a 17 billion parameter brazillian model.
Developed by: Orion Research
- Model: Aura 16x17B SparseMoE
- Model Size: 17 billion parameters
- Context length: 8192 (8k tokens)
This model uses orion
archteture because of our proprietary Q* training algorithm implementation, but you can replace the archteture by MistralForCausalLM/mistral or even LlamaForCausalLM/llama, the model will also performs very well, because Q* is used only during training.
Use
# pip install transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "orion-research/Aura-16x17B-SparseMoE"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
# Format message with the Aura 16x17B SparseMoE chat template
messages = [{"role": "user", "content": "Olá! Como vai você hoje?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
gen_tokens = model.generate(
input_ids,
max_new_tokens=4096,
do_sample=True,
temperature=0.4,
)
gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)
Model Details
Input: text only.
Output: text only.
Model Architecture: This is an auto-regressive language model that uses an optimized transformer architecture and the base model is based on Llama 3 8B. After pretraining, this model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety (ORPO).
Languages covered: The model is optimized to perform well in the following languages: English and Brazilian Portuguese
Context length: The supports a context length of 8k tokens.
Limitations: This model is not fully trained, this is a checkpoint from converging moment, feel free to run a continued trained to your own preferenced tasks.
Training Details
Training Factors We used custom training libraries, Orion Research Mini Cluster for pretraining. Carbon Footprint Pretraining utilized a cumulative 4.800 GPU hours of computation on hardware of type RTX-4090-24GB (TDP of 500W) and RTX-3090-24GB (TDP of 500W). Total emissions were ZERO tCO2eq, 100% of which were offset by Orion’s sustainability program.
Time (GPU hours) | Power Consumption (W) | Carbon Emitted(tCO2eq) | |
Aura Llama 3 8B base model | 900 | 500 | 0 ZERO |
Aura Q* Scaling to 17B | 900 | 500 | 0 ZERO |
Aura Final 16x17B SparseMoE | 3K | 500 | 0 ZERO |
Total | 4.8K | 1500 | 0 ZERO |
We Are Green
We emit ZER0 CO2: Orion Research has it's own Solar Usine based on Conceição do Coité - BA - Brazil
Paper
...comming soon...
Training Code
...comming soon maybe...
Model Card Contact
For additional questions about details in this model card, contact kayky@orion.moe.
- Downloads last month
- 1