HuggingFaceFW/fineweb-edu
Viewer โข Updated โข 3.5B โข 392k โข 1.17k
BananaMind-1.5-Base is a small English causal language model trained from scratch by BananaMind.
It is our first fully pretrained medium model
| Field | Value |
|---|---|
| Parameters | 75,054,720 |
| Architecture | Llama-style decoder-only Transformer |
| Layers | 12 |
| Hidden size | 640 |
| Intermediate size | 1728 |
| Attention heads | 10 |
| KV heads | 5 |
| Context length | 4096 tokens |
| Vocabulary size | 32,000 |
| Tokenizer | Custom byte-level BPE |
| Training tokens | ~27B tokens |
| Precision | BF16 training, safetensors release |
| Model type | Base causal LM |
| Training Cost | 103.31$(PLEASE LIKE THIS IS SO EXPENSIVE) |
| Training GPU | RTX Pro 6000 |
A instruction tuned version is coming very soon.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
repo = "BananaMind/BananaMind-1.5-Base"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if torch.cuda.is_available() else torch.float32
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=False)
model = AutoModelForCausalLM.from_pretrained(
repo,
trust_remote_code=False,
dtype=dtype,
).to(device)
model.eval()
prompt = "The color of the sky is blue. The color of a banana is"
inputs = tok(prompt, return_tensors="pt").to(device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=16,
do_sample=True,
temperature=0.7,
top_p=0.9,
pad_token_id=tok.eos_token_id,
eos_token_id=tok.eos_token_id,
)
print(tok.decode(out[0], skip_special_tokens=True))
Recommended starting settings:
temperature = 0.7
top_p = 0.9
max_new_tokens = 64
For deterministic sanity tests:
do_sample = False
max_new_tokens = 8
BananaMind-1.5-Base was trained from scratch on approximately 27B tokens of FineWeb-Edu-style English web text.
The model uses a custom 32k byte-level BPE tokenizer and a compact Llama-style architecture with grouped-query attention.
BananaMind-1.5-Base uses a compact Llama-style decoder architecture:
Our model performs very good in comparison to other models:
| Model | HellaSwag | ARC-Easy | ARC-Challenge | PIQA | ArithMark-2.0 | Average |
|---|---|---|---|---|---|---|
| BananaMind-1.5-Base | 30.91% | 42.38% | 23.98% | 60.55% | 26.68% | 36.90% |
| Gemma 3 IT 270M | 37.70% | - | - | 66.20% | - | - |
| Zupra-1.6-Instruct-Ultra-Exp | 29.66% | 34.41% | 25.51% | 59.74% | 30.44% | 35.95% |
| KeyLM 75M | 29.66% | 35.73% | 23.98% | 60.50% | 25.80% | 35.13% |
| GPT-2 124M | 31.26% | 39.35% | 22.35% | 62.08% | 26.48% | 36.30% |
@misc{bananamind15base,
title = {BananaMind-1.5-Base},
author = {BananaMind},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/BananaMind/BananaMind-1.5-Base}}
}