Banner

MiniBananaMind-v2-9M

MiniBananaMind-v2-9M is a small causal language model trained from scratch on FineWeb-Edu.

The model has about 8.9M parameters and uses a custom 8k-token byte-level BPE tokenizer.

Model Details

Field Value
Parameters ~8.9M
Architecture Custom Llama-style decoder
Layers 9
Hidden size 256
Intermediate size 768
Attention heads 8
KV heads 2
Vocabulary size 8,192
Context length 2,048
Weight format safetensors
Training precision BF16
Checkpoint used checkpoint-6755

Training Data

MiniBananaMind-v2-9M was trained on:

  • Dataset: HuggingFaceFW/fineweb-edu
  • Config: sample-10BT
  • Text domain: educational web text
  • Tokenizer: custom 8k byte-level BPE tokenizer
  • Training tokens seen: ~3.54B tokens after retokenization

The model was trained from scratch. No benchmark datasets were used for training.

Evaluation

The final checkpoint used for this model is checkpoint-6755.

The name leaf-alpha was also used for it.

Benchmark Metric Score
HellaSwag acc_norm 27.04%
ARC-Easy acc_norm 33.92%
ARC-Challenge acc_norm 20.73%
PIQA acc 55.06%
ArithMark-2 acc 25.32%

Average across the 5 listed tasks: 32.41%

Usage

This model uses custom architecture code, so it must be loaded with trust_remote_code=True.

Example:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "BananaMind/MiniBananaMind-v2-9M"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.float16,
).cuda().eval()

prompt = "The capital of France is"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=80,
        do_sample=True,
        temperature=0.8,
        top_p=0.95,
    )

print(tokenizer.decode(output[0], skip_special_tokens=True))

Greedy Generation Example

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "BananaMind/MiniBananaMind-v2-9M"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.float16,
).cuda().eval()

prompt = "The capital of France is"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=80,
        do_sample=False,
    )

print(tokenizer.decode(output[0], skip_special_tokens=True))

Trained on a 5070 Ti in 4 hours and 34 minutes.

Downloads last month
-
Safetensors
Model size
8.88M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train BananaMind/MiniBananaMind-v2-9M

Space using BananaMind/MiniBananaMind-v2-9M 1