πŸ¦… Supra Mini v6 1M

Supra Mini v6 1M is a very small model tand it's the sixth version of our Supra Mini series.

Model Config

  • Parameters: 1,410,688 (1M)
  • Architecture: Llama
  • Vocab size with custom BPE tokenizer: 4096
  • Hidden Size: 128
  • Intermediate Size: 256
  • Hidden Layers: 6
  • Attention Heads: 4
  • Key Value Heads: 2
  • Max Position Embeddings: 1024
  • Learning rate: 6e-4
  • Weight Decay: 0.1
  • Trained in bfloat16

Final Loss

This model reached a final CrossEntropy loss (on the train set) of 3.79.

Benchmarks

All benchmarks were executed using lm_eval.

Task Value Random level
Arc_Easy ↑ 0.3026 0.25 (25%)
Wikitext (byte PPL) ↓ 3.0043 -
BLiMP ↑ 0.6186 0.5 (50%)

For further benchmarks, see benchmarks.md in this repo's files list.

Usage

To use our model, just run this code:

from transformers import pipeline
import torch

print("Loading Supra Mini v6 1M model from Hugging Face...")
pipe = pipeline(
    "text-generation", 
    model="SupraLabs/Supra-Mini-v6-1M",
    device_map="auto",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)

def generate_text(prompt, max_length=150):
    result = pipe(
        prompt, 
        max_new_tokens=max_length,
        do_sample=True,
        temperature=0.5,
        top_k=25,
        top_p=0.9,
        repetition_penalty=1.2,
        pad_token_id=pipe.tokenizer.pad_token_id,
        eos_token_id=pipe.tokenizer.eos_token_id
    )
    return result[0]['generated_text']

test_prompt = "The importance of education is"
print(f"\nPrompt: {test_prompt}")
print("-" * 30)
print("\nOutput:\n" + generate_text(test_prompt))

Use cases

  1. Educational research
  2. deployment or testing/fine-tuning on edge environments
  3. Or more simply, for fun

Limitations

  1. Cannot reason, chat, or code
  2. Incoherent more often than not
  3. Mostly unfactual

Training guide

We trained Supra Mini v6 1M on a single NVIDIA RTX 5060 Ti 16GB in ~3 hours for 1 epoch.
The full training code can be found in this repo as train_tokenizer.py (train costum BPE tokenizer with vocab size of 16384) and train_model.py (train the model).
The model was trained on the first 5 billion tokens of 70% Sample-10BT from Fineweb-Edu and 30% Cosmopedia-v2.

Downloads last month
22
Safetensors
Model size
1.41M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train SupraLabs/Supra-Mini-v6-1M

Spaces using SupraLabs/Supra-Mini-v6-1M 3

Collection including SupraLabs/Supra-Mini-v6-1M