BananaMind-1.5-Base

BananaMind-1.5-Base is a small English causal language model trained from scratch by BananaMind.

It is our first fully pretrained medium model

Model Details

Field	Value
Parameters	75,054,720
Architecture	Llama-style decoder-only Transformer
Layers	12
Hidden size	640
Intermediate size	1728
Attention heads	10
KV heads	5
Context length	4096 tokens
Vocabulary size	32,000
Tokenizer	Custom byte-level BPE
Training tokens	~27B tokens
Precision	BF16 training, safetensors release
Model type	Base causal LM
Training Cost	103.31$(PLEASE LIKE THIS IS SO EXPENSIVE)
Training GPU	RTX Pro 6000

A instruction tuned version is coming very soon.

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

repo = "BananaMind/BananaMind-1.5-Base"

device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if torch.cuda.is_available() else torch.float32

tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=False)

model = AutoModelForCausalLM.from_pretrained(
    repo,
    trust_remote_code=False,
    dtype=dtype,
).to(device)

model.eval()

prompt = "The color of the sky is blue. The color of a banana is"
inputs = tok(prompt, return_tensors="pt").to(device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=16,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        pad_token_id=tok.eos_token_id,
        eos_token_id=tok.eos_token_id,
    )

print(tok.decode(out[0], skip_special_tokens=True))

Generation Settings

Recommended starting settings:

temperature = 0.7
top_p = 0.9
max_new_tokens = 64

For deterministic sanity tests:

do_sample = False
max_new_tokens = 8

Training

BananaMind-1.5-Base was trained from scratch on approximately 27B tokens of FineWeb-Edu-style English web text.

The model uses a custom 32k byte-level BPE tokenizer and a compact Llama-style architecture with grouped-query attention.

Architecture

BananaMind-1.5-Base uses a compact Llama-style decoder architecture:

12 Transformer layers
640 hidden size
1728 intermediate size
10 attention heads
5 key-value heads
grouped-query attention
SiLU activation
RMSNorm
tied input/output embeddings
4096 token context length

Evaluation

Our model performs very good in comparison to other models:

Model	HellaSwag	ARC-Easy	ARC-Challenge	PIQA	ArithMark-2.0	Average
BananaMind-1.5-Base	30.91%	42.38%	23.98%	60.55%	26.68%	36.90%
Gemma 3 IT 270M	37.70%	-	-	66.20%	-	-
Zupra-1.6-Instruct-Ultra-Exp	29.66%	34.41%	25.51%	59.74%	30.44%	35.95%
KeyLM 75M	29.66%	35.73%	23.98%	60.50%	25.80%	35.13%
GPT-2 124M	31.26%	39.35%	22.35%	62.08%	26.48%	36.30%

Parameter vs Size

Citation

@misc{bananamind15base,
  title = {BananaMind-1.5-Base},
  author = {BananaMind},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/BananaMind/BananaMind-1.5-Base}}
}

Downloads last month: -

Safetensors

Model size

75.1M params

Tensor type

BF16

BananaMind
/

BananaMind-1.5-Base