Banner

MiniBananaMind-v3-9M

MiniBananaMind-v3-9M is a small causal language model trained from scratch on FineWeb-Edu and FineMath.

The model has about 8.9M parameters and uses a custom 8k-token byte-level BPE tokenizer with digit-aware tokenization.

This is a base language model, not an instruction-tuned chat assistant.

Model Details

Field Value
Parameters 8,884,992
Architecture Custom Llama-style decoder
Layers 9
Hidden size 256
Intermediate size 768
Attention heads 8
KV heads 2
Vocabulary size 8,192
Context length 1,024
Embeddings Tied input/output embeddings
Weight format safetensors
Training precision BF16
Checkpoint used latest mixed checkpoint

Tokenizer

MiniBananaMind-v3-9M uses a new digit-aware 8k tokenizer.

Digits are kept as separate tokens so numbers do not collapse into large number tokens during tokenization.

Digit IDs:

Token ID
1 9
2 10
3 11
4 12
5 13
6 14
7 15
8 16
9 17
0 18

Examples:

18  -> [9, 16]
227 -> [10, 10, 15]

Training Data

MiniBananaMind-v3-9M was trained on:

  • HuggingFaceFW/fineweb-edu
  • HuggingFaceTB/finemath

The training mix used both general educational web text and math-heavy text.

Dataset Tokens
FineWeb-Edu sample-10BT retokenized with digit tokenizer 12,047,375,481
FineMath finemath-4plus retokenized with digit tokenizer 1,500,000,000

Training setup:

Field Value
Sequence length 1,024
FineMath sampling ratio 30%
FineWeb sampling ratio 70%
Batch size 72
Gradient accumulation 16
Tokens per optimizer step 1,179,648
Training steps 11,471
Approx training tokens seen 13,531,742,208
Learning rate 5e-4
Minimum learning rate 5e-5
Warmup steps 500
Weight decay 0.1
Hardware NVIDIA RTX 5070 Ti

Evaluation

Formal benchmark results for this checkpoint are not included yet.

Usage

This model uses custom architecture code, so load it with trust_remote_code=True.

Install dependencies:

pip install -U transformers safetensors torch

Run inference:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "BananaMind/MiniBananaMind-v3-9M"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16,
).cuda().eval()

prompt = "The color of the sky is "
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids=input_ids,
        max_new_tokens=64,
        do_sample=False,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output[0], skip_special_tokens=True))

Suggested Generation Settings

For stable continuations:

  • do_sample=False
  • repetition_penalty=1.1
  • max_new_tokens=64 to 128

For more varied text:

  • do_sample=True
  • temperature=0.6
  • top_p=0.9
  • repetition_penalty=1.1
  • max_new_tokens=64 to 128

License

Apache 2.0

Downloads last month
36
Safetensors
Model size
8.88M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train BananaMind/MiniBananaMind-v3-9M

Space using BananaMind/MiniBananaMind-v3-9M 1