MiniBananaMind-V1

MiniBananaMind-V1 preview

MiniBananaMind-V1 is a compact LLaMA-style causal language model from BananaMind. It is trained from scratch for next-token text generation on streamed FineWeb-Edu data and is intended as a small, inspectable base model for experiments, demos, and lightweight research workflows.

This is a base language model, not an instruction-tuned assistant. It is best used for continuation-style generation and experimentation rather than factual question answering or chat.

Model Details

  • Developer: BananaMind
  • Model type: LLaMA-style causal language model
  • Library: Transformers
  • Task: Text generation
  • Training data: FineWeb-Edu
  • Checkpoint: MiniBananaMind-V1 uploaded training checkpoint
  • License: Apache 2.0

Architecture

Setting Value
Layers 6
Hidden size 256
Attention heads 8
KV heads 8
Intermediate size 768
Context length 512 tokens
Vocabulary size 32,000
Parameters ~21.5M
Precision float32 checkpoint

Intended Use

MiniBananaMind-V1 is suitable for:

  • Small-scale language-model experiments
  • Educational demos of decoder-only generation
  • Testing tokenization, generation settings, and inference pipelines
  • Research prototypes where a very small causal LM is useful

It is not recommended for production assistants, safety-critical use, or tasks that require reliable factual knowledge.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

repo_id = "BananaMind/MiniBananaMind-V1"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype=torch.float32,
    device_map="auto",
)

prompt = "A computer is a machine that"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=64,
        do_sample=True,
        temperature=0.2,
        top_p=0.9,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output[0], skip_special_tokens=True))

Generation Notes

Because this is a small base model, output quality depends heavily on prompt style and sampling settings. A temperature of 0.2 is recommended for more stable continuations. For more varied text, increase temperature or top_p.

Limitations

  • The model may hallucinate facts, names, citations, and dates.
  • It has not been instruction tuned or aligned for chat behavior.
  • It may reproduce biases or unsafe patterns present in web-scale training data.
  • The short 512-token context length limits long-document use.
  • Small model size means weaker reasoning and factual recall than larger LMs.

Training Data

MiniBananaMind-V1 was trained on streamed FineWeb-Edu text. FineWeb-Edu is a large educational-quality web corpus, so users should expect broad web-language coverage as well as the usual limitations of internet-scale data.

Training data attribution: this model was trained on FineWeb-Edu, a dataset released by Hugging Face as part of the FineWeb family.

Citation

If you use this model in a project, cite the Hugging Face repository and attribute the FineWeb-Edu training data:

@misc{minibananamindv1,
  title = {MiniBananaMind-V1},
  author = {BananaMind},
  year = {2026},
  howpublished = {\url{https://huggingface.co/BananaMind/MiniBananaMind-V1}}
}

Dataset: HuggingFaceFW/fineweb-edu

Downloads last month
54
Safetensors
Model size
21.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BananaMind/MiniBananaMind-V1

Quantizations
1 model

Dataset used to train BananaMind/MiniBananaMind-V1