Instructions to use Nebulixlabs/Nutral-v1-Tiny with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Nebulixlabs/Nutral-v1-Tiny with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Nebulixlabs/Nutral-v1-Tiny") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Nebulixlabs/Nutral-v1-Tiny") model = AutoModelForMultimodalLM.from_pretrained("Nebulixlabs/Nutral-v1-Tiny") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Nebulixlabs/Nutral-v1-Tiny with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Nebulixlabs/Nutral-v1-Tiny" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Nebulixlabs/Nutral-v1-Tiny", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Nebulixlabs/Nutral-v1-Tiny
- SGLang
How to use Nebulixlabs/Nutral-v1-Tiny with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Nebulixlabs/Nutral-v1-Tiny" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Nebulixlabs/Nutral-v1-Tiny", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Nebulixlabs/Nutral-v1-Tiny" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Nebulixlabs/Nutral-v1-Tiny", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Nebulixlabs/Nutral-v1-Tiny with Docker Model Runner:
docker model run hf.co/Nebulixlabs/Nutral-v1-Tiny
Nutral v1 TinyML
Nutral-v1-Tiny is an ultra-lightweight, custom-trained Causal Language Model designed explicitly for TinyML applications, edge computing, and resource-constrained environments. Developed by Nebulixlabs, this model scales down the Llama architecture to a microscopic level, making it perfect for proof-of-concept deployments on microcontrollers, mobile devices, and Raspberry Pi.
📊 Model Details
- Model Name: Nutral v1 Tiny
- Developer: Nebulixlabs
- Model Type: Causal Language Model
- Architecture: Llama (Custom Micro Configuration)
hidden_size: 128intermediate_size: 348num_hidden_layers: 4num_attention_heads: 4num_key_value_heads: 4vocab_size: 2048
- Parameters: ~1.32 Million
- Context Length: 256 Tokens
- Formats Provided: Hugging Face PyTorch (
.safetensors/.bin) &GGUF
🎯 Intended Uses & Capabilities
Because Nutral-v1-Tiny operates with only 1.3M parameters and a restricted 2048-token vocabulary, its capabilities are strictly fundamental.
Primary Use Cases:
- Edge Device Testing: A dummy/baseline LLM to test deployment pipelines (e.g.,
llama.cpp) on hardware with extremely low RAM. - Basic Text Generation: Next-word prediction for simple English sentences.
- Syntax Recognition: Demonstrating basic grammatical structures learned from educational data.
- Educational Purposes: A fast-training baseline to study Llama architecture behavior at a tiny scale.
Out-of-Scope Uses:
- Conversational AI or Chatbots.
- Logical reasoning, math, or coding tasks.
- Factual QA (the model is highly prone to hallucinations due to its size).
🏋️ Training Details
The model was trained from scratch using a fast-extraction pipeline and optimized hardware.
- Dataset: HuggingFaceFW/fineweb-edu (Using the
sample-10BTsplit) - Tokens Trained: 30 Million tokens
- Hardware: 2x NVIDIA T4 GPUs
- Optimizer: AdamW (
optim="adamw_torch") - Precision: FP16
- Hyperparameters:
- Learning Rate:
6e-4 - Weight Decay:
0.01 - Batch Size:
16(with Gradient Accumulation steps:2) - Max Steps:
3700
- Learning Rate:
🚀 How to Get Started
You can load the model using the standard transformers library or run the optimized .gguf file using llama.cpp.
1. Using Hugging Face Transformers
import torch
from transformers import AutoTokenizer, LlamaForCausalLM
model_id = "Nebulixlabs/Nutral-v1-Tiny"
# Load Tokenizer and Model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = LlamaForCausalLM.from_pretrained(model_id)
# Generate Text
prompt = "The solar system consists of"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=30, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 38