Nova-1 Standard (Phase 2 SFT)

Nova-1 is a 1.2B parameter decoder-only language model from Smilyai Labs. Trained from scratch, it features a custom architecture built for maximum efficiency and native HuggingFace Transformers compatibility.

🧠 Architecture Highlights

  • Mixture-of-Depths (MoD) β€” Dynamically routes only the most important tokens through full compute, skipping the rest for efficiency without sacrificing quality.
  • Grouped-Query Attention (GQA) β€” 16 query heads, 8 KV heads for faster inference and lower VRAM footprint.
  • SwiGLU FFN β€” Gated activation functions for better training stability and downstream performance.
  • Rotary Position Embeddings (RoPE) β€” Native support for YaRN context scaling out of the box.
  • Custom Tokenizer β€” GPT-2 BPE base extended with domain-specific special tokens for code, math, and ChatML.

Model Details

Property Value
Parameters 1.27B
Hidden dim 2048
Layers 24 (12 Full + 12 MoD)
Attention heads 16 (GQA, 8 KV)
Context length 2048 tokens (YaRN stretchable)
Pretraining Tokens ~4.00B
Training Phase 2 (Supervised Fine-Tuning)
Dtype bfloat16

πŸš€ Usage

Because this model is 100% HuggingFace-native, you can use standard pipeline or AutoModelForCausalLM APIs without any custom generation loops. The generation_config.json handles all the sampler defaults for you.

Method 1: HuggingFace Pipeline (Easiest)

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation", 
    model="Smilyai-labs/Nova-1-Standard", 
    torch_dtype=torch.bfloat16, 
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role": "system", "content": "You are Nova, a helpful, honest AI assistant."},
    {"role": "user", "content": "Write a Python function to check if a number is prime."}
]

# The pipeline automatically applies ChatML and uses the correct sampler defaults!
response = pipe(messages, max_new_tokens=256)
print(response[0]['generated_text'][-1]['content'])

Method 2: Standard AutoModel

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Smilyai-labs/Nova-1-Standard"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are Nova, a helpful, honest AI assistant."},
    {"role": "user", "content": "Explain recursion like I'm five."}
]

# Apply ChatML template
inputs = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True, 
    return_tensors="pt"
).to(model.device)

# Generate (uses repo generation_config defaults)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))

⚠️ Note on Inference: This model's architecture intentionally disables HuggingFace's KV Cache (use_cache=False) to ensure maximum context retention. The prepare_inputs_for_generation method automatically handles passing the full context window on each step. Just don't manually pass use_cache=True or it will throw a warning and force it back to False.

🏷️ Special Tokens

Nova-1 natively understands domain markers and ChatML structure.

  • <|im_start|>, <|im_end|> β€” Chat format markers
  • <|code_start|>, <|code_end|> β€” Code boundaries
  • <|math_start|>, <|math_end|> β€” Math content
  • <|domain_code|>, <|domain_math|>, <|domain_general|> β€” Domain context indicators (used in pretraining, though Phase 2 SFT primarily relies on pure ChatML)

πŸ“š Training Data

Phase 1 (Pretraining): Trained on ~4B tokens of high-quality filtered web text, code, and math.

  • General text: FineWeb, C4, Wikipedia
  • Code: The Stack v2, CodeSearchNet, Magicoder
  • Math: Open-Web-Math, MetaMathQA

Phase 2 (Instruction Tuning): Supervised Fine-Tuning on ~200k high-quality multi-turn conversations and identity reinforcement data.

  • Chat: OpenHermes 2.5, UltraChat 200k, Tulu Mix
  • Code: Evol-Instruct, CodeFeedback
  • Math: MetaMathQA, GSM8K
  • Identity: Custom synthetic dataset to establish Nova persona and resist jailbreaks.

License

Apache 2.0

Citation

@software{nova1,
  author = {Smilyai Labs},
  title = {Nova-1: Mixture-of-Depths Language Model},
  year = {2024},
  url = {https://huggingface.co/Smilyai-labs/Nova-1-Standard}
}

Built with πŸ’™ by Smilyai Labs

Downloads last month
-
Safetensors
Model size
1B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using Smilyai-labs/Nova-1-Standard-1.3B-Preview 4