Instructions to use Smilyai-labs/Nova-1-Standard-1.3B-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Smilyai-labs/Nova-1-Standard-1.3B-Preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Smilyai-labs/Nova-1-Standard-1.3B-Preview", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Smilyai-labs/Nova-1-Standard-1.3B-Preview", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Smilyai-labs/Nova-1-Standard-1.3B-Preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Smilyai-labs/Nova-1-Standard-1.3B-Preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Smilyai-labs/Nova-1-Standard-1.3B-Preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Smilyai-labs/Nova-1-Standard-1.3B-Preview

SGLang

How to use Smilyai-labs/Nova-1-Standard-1.3B-Preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Smilyai-labs/Nova-1-Standard-1.3B-Preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Smilyai-labs/Nova-1-Standard-1.3B-Preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Smilyai-labs/Nova-1-Standard-1.3B-Preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Smilyai-labs/Nova-1-Standard-1.3B-Preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Smilyai-labs/Nova-1-Standard-1.3B-Preview with Docker Model Runner:
```
docker model run hf.co/Smilyai-labs/Nova-1-Standard-1.3B-Preview
```

Nova-1 Standard (Phase 2 SFT)

Nova-1 is a 1.2B parameter decoder-only language model from Smilyai Labs. Trained from scratch, it features a custom architecture built for maximum efficiency and native HuggingFace Transformers compatibility.

🧠 Architecture Highlights

Mixture-of-Depths (MoD) — Dynamically routes only the most important tokens through full compute, skipping the rest for efficiency without sacrificing quality.
Grouped-Query Attention (GQA) — 16 query heads, 8 KV heads for faster inference and lower VRAM footprint.
SwiGLU FFN — Gated activation functions for better training stability and downstream performance.
Rotary Position Embeddings (RoPE) — Native support for YaRN context scaling out of the box.
Custom Tokenizer — GPT-2 BPE base extended with domain-specific special tokens for code, math, and ChatML.

Model Details

Property	Value
Parameters	1.27B
Hidden dim	2048
Layers	24 (12 Full + 12 MoD)
Attention heads	16 (GQA, 8 KV)
Context length	2048 tokens (YaRN stretchable)
Pretraining Tokens	~4.00B
Training Phase	2 (Supervised Fine-Tuning)
Dtype	bfloat16

🚀 Usage

Because this model is 100% HuggingFace-native, you can use standard pipeline or AutoModelForCausalLM APIs without any custom generation loops. The generation_config.json handles all the sampler defaults for you.

Method 1: HuggingFace Pipeline (Easiest)

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation", 
    model="Smilyai-labs/Nova-1-Standard", 
    torch_dtype=torch.bfloat16, 
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role": "system", "content": "You are Nova, a helpful, honest AI assistant."},
    {"role": "user", "content": "Write a Python function to check if a number is prime."}
]

# The pipeline automatically applies ChatML and uses the correct sampler defaults!
response = pipe(messages, max_new_tokens=256)
print(response[0]['generated_text'][-1]['content'])

Method 2: Standard AutoModel

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Smilyai-labs/Nova-1-Standard"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are Nova, a helpful, honest AI assistant."},
    {"role": "user", "content": "Explain recursion like I'm five."}
]

# Apply ChatML template
inputs = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True, 
    return_tensors="pt"
).to(model.device)

# Generate (uses repo generation_config defaults)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))

⚠️ Note on Inference: This model's architecture intentionally disables HuggingFace's KV Cache (use_cache=False) to ensure maximum context retention. The prepare_inputs_for_generation method automatically handles passing the full context window on each step. Just don't manually pass use_cache=True or it will throw a warning and force it back to False.

🏷️ Special Tokens

Nova-1 natively understands domain markers and ChatML structure.

<|im_start|>, <|im_end|> — Chat format markers
<|code_start|>, <|code_end|> — Code boundaries
<|math_start|>, <|math_end|> — Math content
<|domain_code|>, <|domain_math|>, <|domain_general|> — Domain context indicators (used in pretraining, though Phase 2 SFT primarily relies on pure ChatML)

📚 Training Data

Phase 1 (Pretraining): Trained on ~4B tokens of high-quality filtered web text, code, and math.

General text: FineWeb, C4, Wikipedia
Code: The Stack v2, CodeSearchNet, Magicoder
Math: Open-Web-Math, MetaMathQA

Phase 2 (Instruction Tuning): Supervised Fine-Tuning on ~200k high-quality multi-turn conversations and identity reinforcement data.

Chat: OpenHermes 2.5, UltraChat 200k, Tulu Mix
Code: Evol-Instruct, CodeFeedback
Math: MetaMathQA, GSM8K
Identity: Custom synthetic dataset to establish Nova persona and resist jailbreaks.

License

Apache 2.0

Citation

@software{nova1,
  author = {Smilyai Labs},
  title = {Nova-1: Mixture-of-Depths Language Model},
  year = {2024},
  url = {https://huggingface.co/Smilyai-labs/Nova-1-Standard}
}

Built with 💙 by Smilyai Labs

Downloads last month: -

Safetensors

Model size

1B params

Tensor type

BF16

Smilyai-labs
/

Nova-1-Standard-1.3B-Preview