Instructions to use SupraLabs/Supra-Mini-v6-1M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SupraLabs/Supra-Mini-v6-1M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SupraLabs/Supra-Mini-v6-1M")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SupraLabs/Supra-Mini-v6-1M")
model = AutoModelForCausalLM.from_pretrained("SupraLabs/Supra-Mini-v6-1M")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use SupraLabs/Supra-Mini-v6-1M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SupraLabs/Supra-Mini-v6-1M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-v6-1M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/SupraLabs/Supra-Mini-v6-1M

SGLang

How to use SupraLabs/Supra-Mini-v6-1M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SupraLabs/Supra-Mini-v6-1M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-v6-1M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SupraLabs/Supra-Mini-v6-1M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-v6-1M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use SupraLabs/Supra-Mini-v6-1M with Docker Model Runner:
```
docker model run hf.co/SupraLabs/Supra-Mini-v6-1M
```

🦅 Supra Mini v6 1M

Supra Mini v6 1M is a very small model tand it's the sixth version of our Supra Mini series.

Model Config

Parameters: 1,410,688 (1M)
Architecture: Llama
Vocab size with custom BPE tokenizer: 4096
Hidden Size: 128
Intermediate Size: 256
Hidden Layers: 6
Attention Heads: 4
Key Value Heads: 2
Max Position Embeddings: 1024
Learning rate: 6e-4
Weight Decay: 0.1
Trained in bfloat16

Final Loss

This model reached a final CrossEntropy loss (on the train set) of 3.79.

Benchmarks

All benchmarks were executed using lm_eval.

Task	Value	Random level
Arc_Easy ↑	0.3026	0.25 (25%)
Wikitext (byte PPL) ↓	3.0043	-
BLiMP ↑	0.6186	0.5 (50%)

For further benchmarks, see benchmarks.md in this repo's files list.

Usage

To use our model, just run this code:

from transformers import pipeline
import torch

print("Loading Supra Mini v6 1M model from Hugging Face...")
pipe = pipeline(
    "text-generation", 
    model="SupraLabs/Supra-Mini-v6-1M",
    device_map="auto",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)

def generate_text(prompt, max_length=150):
    result = pipe(
        prompt, 
        max_new_tokens=max_length,
        do_sample=True,
        temperature=0.5,
        top_k=25,
        top_p=0.9,
        repetition_penalty=1.2,
        pad_token_id=pipe.tokenizer.pad_token_id,
        eos_token_id=pipe.tokenizer.eos_token_id
    )
    return result[0]['generated_text']

test_prompt = "The importance of education is"
print(f"\nPrompt: {test_prompt}")
print("-" * 30)
print("\nOutput:\n" + generate_text(test_prompt))

Use cases

Educational research
deployment or testing/fine-tuning on edge environments
Or more simply, for fun

Limitations

Cannot reason, chat, or code
Incoherent more often than not
Mostly unfactual

Training guide

We trained Supra Mini v6 1M on a single NVIDIA RTX 5060 Ti 16GB in ~3 hours for 1 epoch.
The full training code can be found in this repo as train_tokenizer.py (train costum BPE tokenizer with vocab size of 16384) and train_model.py (train the model).
The model was trained on the first 5 billion tokens of 70% Sample-10BT from Fineweb-Edu and 30% Cosmopedia-v2.

Downloads last month: 22

Safetensors

Model size

1.41M params

Tensor type

F32

Dataset used to train SupraLabs/Supra-Mini-v6-1M

Spaces using SupraLabs/Supra-Mini-v6-1M 3

Collection including SupraLabs/Supra-Mini-v6-1M

Supra Mini series

Collection

All models of the Supra Mini series. • 6 items • Updated 1 day ago • 2