Instructions to use Smilyai-labs/Nova-1-Standard-1.3B-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Smilyai-labs/Nova-1-Standard-1.3B-Preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Smilyai-labs/Nova-1-Standard-1.3B-Preview", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Smilyai-labs/Nova-1-Standard-1.3B-Preview", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Smilyai-labs/Nova-1-Standard-1.3B-Preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Smilyai-labs/Nova-1-Standard-1.3B-Preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Smilyai-labs/Nova-1-Standard-1.3B-Preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Smilyai-labs/Nova-1-Standard-1.3B-Preview
- SGLang
How to use Smilyai-labs/Nova-1-Standard-1.3B-Preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Smilyai-labs/Nova-1-Standard-1.3B-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Smilyai-labs/Nova-1-Standard-1.3B-Preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Smilyai-labs/Nova-1-Standard-1.3B-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Smilyai-labs/Nova-1-Standard-1.3B-Preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Smilyai-labs/Nova-1-Standard-1.3B-Preview with Docker Model Runner:
docker model run hf.co/Smilyai-labs/Nova-1-Standard-1.3B-Preview
Nova-1 Standard (Phase 2 SFT)
Nova-1 is a 1.2B parameter decoder-only language model from Smilyai Labs. Trained from scratch, it features a custom architecture built for maximum efficiency and native HuggingFace Transformers compatibility.
π§ Architecture Highlights
- Mixture-of-Depths (MoD) β Dynamically routes only the most important tokens through full compute, skipping the rest for efficiency without sacrificing quality.
- Grouped-Query Attention (GQA) β 16 query heads, 8 KV heads for faster inference and lower VRAM footprint.
- SwiGLU FFN β Gated activation functions for better training stability and downstream performance.
- Rotary Position Embeddings (RoPE) β Native support for YaRN context scaling out of the box.
- Custom Tokenizer β GPT-2 BPE base extended with domain-specific special tokens for code, math, and ChatML.
Model Details
| Property | Value |
|---|---|
| Parameters | 1.27B |
| Hidden dim | 2048 |
| Layers | 24 (12 Full + 12 MoD) |
| Attention heads | 16 (GQA, 8 KV) |
| Context length | 2048 tokens (YaRN stretchable) |
| Pretraining Tokens | ~4.00B |
| Training Phase | 2 (Supervised Fine-Tuning) |
| Dtype | bfloat16 |
π Usage
Because this model is 100% HuggingFace-native, you can use standard pipeline or AutoModelForCausalLM APIs without any custom generation loops. The generation_config.json handles all the sampler defaults for you.
Method 1: HuggingFace Pipeline (Easiest)
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="Smilyai-labs/Nova-1-Standard",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
messages = [
{"role": "system", "content": "You are Nova, a helpful, honest AI assistant."},
{"role": "user", "content": "Write a Python function to check if a number is prime."}
]
# The pipeline automatically applies ChatML and uses the correct sampler defaults!
response = pipe(messages, max_new_tokens=256)
print(response[0]['generated_text'][-1]['content'])
Method 2: Standard AutoModel
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Smilyai-labs/Nova-1-Standard"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are Nova, a helpful, honest AI assistant."},
{"role": "user", "content": "Explain recursion like I'm five."}
]
# Apply ChatML template
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
# Generate (uses repo generation_config defaults)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
β οΈ Note on Inference: This model's architecture intentionally disables HuggingFace's KV Cache (
use_cache=False) to ensure maximum context retention. Theprepare_inputs_for_generationmethod automatically handles passing the full context window on each step. Just don't manually passuse_cache=Trueor it will throw a warning and force it back toFalse.
π·οΈ Special Tokens
Nova-1 natively understands domain markers and ChatML structure.
<|im_start|>,<|im_end|>β Chat format markers<|code_start|>,<|code_end|>β Code boundaries<|math_start|>,<|math_end|>β Math content<|domain_code|>,<|domain_math|>,<|domain_general|>β Domain context indicators (used in pretraining, though Phase 2 SFT primarily relies on pure ChatML)
π Training Data
Phase 1 (Pretraining): Trained on ~4B tokens of high-quality filtered web text, code, and math.
- General text: FineWeb, C4, Wikipedia
- Code: The Stack v2, CodeSearchNet, Magicoder
- Math: Open-Web-Math, MetaMathQA
Phase 2 (Instruction Tuning): Supervised Fine-Tuning on ~200k high-quality multi-turn conversations and identity reinforcement data.
- Chat: OpenHermes 2.5, UltraChat 200k, Tulu Mix
- Code: Evol-Instruct, CodeFeedback
- Math: MetaMathQA, GSM8K
- Identity: Custom synthetic dataset to establish Nova persona and resist jailbreaks.
License
Apache 2.0
Citation
@software{nova1,
author = {Smilyai Labs},
title = {Nova-1: Mixture-of-Depths Language Model},
year = {2024},
url = {https://huggingface.co/Smilyai-labs/Nova-1-Standard}
}
Built with π by Smilyai Labs
- Downloads last month
- -