Instructions to use SupraLabs/Supra-Mini-v6-1M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SupraLabs/Supra-Mini-v6-1M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SupraLabs/Supra-Mini-v6-1M")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SupraLabs/Supra-Mini-v6-1M") model = AutoModelForCausalLM.from_pretrained("SupraLabs/Supra-Mini-v6-1M") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SupraLabs/Supra-Mini-v6-1M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SupraLabs/Supra-Mini-v6-1M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/Supra-Mini-v6-1M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/SupraLabs/Supra-Mini-v6-1M
- SGLang
How to use SupraLabs/Supra-Mini-v6-1M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SupraLabs/Supra-Mini-v6-1M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/Supra-Mini-v6-1M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SupraLabs/Supra-Mini-v6-1M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/Supra-Mini-v6-1M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use SupraLabs/Supra-Mini-v6-1M with Docker Model Runner:
docker model run hf.co/SupraLabs/Supra-Mini-v6-1M
π¦ Supra Mini v6 1M
Supra Mini v6 1M is a very small model tand it's the sixth version of our Supra Mini series.
Model Config
- Parameters: 1,410,688 (1M)
- Architecture: Llama
- Vocab size with custom BPE tokenizer: 4096
- Hidden Size: 128
- Intermediate Size: 256
- Hidden Layers: 6
- Attention Heads: 4
- Key Value Heads: 2
- Max Position Embeddings: 1024
- Learning rate: 6e-4
- Weight Decay: 0.1
- Trained in bfloat16
Final Loss
This model reached a final CrossEntropy loss (on the train set) of 3.79.
Benchmarks
All benchmarks were executed using lm_eval.
| Task | Value | Random level |
|---|---|---|
| Arc_Easy β | 0.3026 | 0.25 (25%) |
| Wikitext (byte PPL) β | 3.0043 | - |
| BLiMP β | 0.6186 | 0.5 (50%) |
For further benchmarks, see benchmarks.md in this repo's files list.
Usage
To use our model, just run this code:
from transformers import pipeline
import torch
print("Loading Supra Mini v6 1M model from Hugging Face...")
pipe = pipeline(
"text-generation",
model="SupraLabs/Supra-Mini-v6-1M",
device_map="auto",
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)
def generate_text(prompt, max_length=150):
result = pipe(
prompt,
max_new_tokens=max_length,
do_sample=True,
temperature=0.5,
top_k=25,
top_p=0.9,
repetition_penalty=1.2,
pad_token_id=pipe.tokenizer.pad_token_id,
eos_token_id=pipe.tokenizer.eos_token_id
)
return result[0]['generated_text']
test_prompt = "The importance of education is"
print(f"\nPrompt: {test_prompt}")
print("-" * 30)
print("\nOutput:\n" + generate_text(test_prompt))
Use cases
- Educational research
- deployment or testing/fine-tuning on edge environments
- Or more simply, for fun
Limitations
- Cannot reason, chat, or code
- Incoherent more often than not
- Mostly unfactual
Training guide
We trained Supra Mini v6 1M on a single NVIDIA RTX 5060 Ti 16GB in ~3 hours for 1 epoch.
The full training code can be found in this repo as train_tokenizer.py (train costum BPE tokenizer with vocab size of 16384) and train_model.py (train the model).
The model was trained on the first 5 billion tokens of 70% Sample-10BT from Fineweb-Edu and 30% Cosmopedia-v2.
- Downloads last month
- 22