Instructions to use Nebulixlabs/Nutral-v1-Tiny with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Nebulixlabs/Nutral-v1-Tiny with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Nebulixlabs/Nutral-v1-Tiny")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Nebulixlabs/Nutral-v1-Tiny")
model = AutoModelForMultimodalLM.from_pretrained("Nebulixlabs/Nutral-v1-Tiny")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Nebulixlabs/Nutral-v1-Tiny with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Nebulixlabs/Nutral-v1-Tiny"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nebulixlabs/Nutral-v1-Tiny",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Nebulixlabs/Nutral-v1-Tiny

SGLang

How to use Nebulixlabs/Nutral-v1-Tiny with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Nebulixlabs/Nutral-v1-Tiny" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nebulixlabs/Nutral-v1-Tiny",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Nebulixlabs/Nutral-v1-Tiny" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nebulixlabs/Nutral-v1-Tiny",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Nebulixlabs/Nutral-v1-Tiny with Docker Model Runner:
```
docker model run hf.co/Nebulixlabs/Nutral-v1-Tiny
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Nutral v1 TinyML

Nutral-v1-Tiny is an ultra-lightweight, custom-trained Causal Language Model designed explicitly for TinyML applications, edge computing, and resource-constrained environments. Developed by Nebulixlabs, this model scales down the Llama architecture to a microscopic level, making it perfect for proof-of-concept deployments on microcontrollers, mobile devices, and Raspberry Pi.

📊 Model Details

Model Name: Nutral v1 Tiny
Developer: Nebulixlabs
Model Type: Causal Language Model
Architecture: Llama (Custom Micro Configuration)
- hidden_size: 128
- intermediate_size: 348
- num_hidden_layers: 4
- num_attention_heads: 4
- num_key_value_heads: 4
- vocab_size: 2048
Parameters: ~1.32 Million
Context Length: 256 Tokens
Formats Provided: Hugging Face PyTorch (.safetensors/.bin) & GGUF

🎯 Intended Uses & Capabilities

Because Nutral-v1-Tiny operates with only 1.3M parameters and a restricted 2048-token vocabulary, its capabilities are strictly fundamental.

Primary Use Cases:

Edge Device Testing: A dummy/baseline LLM to test deployment pipelines (e.g., llama.cpp) on hardware with extremely low RAM.
Basic Text Generation: Next-word prediction for simple English sentences.
Syntax Recognition: Demonstrating basic grammatical structures learned from educational data.
Educational Purposes: A fast-training baseline to study Llama architecture behavior at a tiny scale.

Out-of-Scope Uses:

Conversational AI or Chatbots.
Logical reasoning, math, or coding tasks.
Factual QA (the model is highly prone to hallucinations due to its size).

🏋️ Training Details

The model was trained from scratch using a fast-extraction pipeline and optimized hardware.

Dataset: HuggingFaceFW/fineweb-edu (Using the sample-10BT split)
Tokens Trained: 30 Million tokens
Hardware: 2x NVIDIA T4 GPUs
Optimizer: AdamW (optim="adamw_torch")
Precision: FP16
Hyperparameters:
- Learning Rate: 6e-4
- Weight Decay: 0.01
- Batch Size: 16 (with Gradient Accumulation steps: 2)
- Max Steps: 3700

🚀 How to Get Started

You can load the model using the standard transformers library or run the optimized .gguf file using llama.cpp.

1. Using Hugging Face Transformers

import torch
from transformers import AutoTokenizer, LlamaForCausalLM

model_id = "Nebulixlabs/Nutral-v1-Tiny"

# Load Tokenizer and Model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = LlamaForCausalLM.from_pretrained(model_id)

# Generate Text
prompt = "The solar system consists of"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=30, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 38

Safetensors

Model size

1.32M params

Tensor type

F32

Model tree for Nebulixlabs/Nutral-v1-Tiny

Quantizations

1 model

Nebulixlabs
/

Nutral-v1-Tiny