Instructions to use BananaMind/MiniBananaMind-V1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BananaMind/MiniBananaMind-V1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="BananaMind/MiniBananaMind-V1")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("BananaMind/MiniBananaMind-V1") model = AutoModelForMultimodalLM.from_pretrained("BananaMind/MiniBananaMind-V1") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use BananaMind/MiniBananaMind-V1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BananaMind/MiniBananaMind-V1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BananaMind/MiniBananaMind-V1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/BananaMind/MiniBananaMind-V1
- SGLang
How to use BananaMind/MiniBananaMind-V1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "BananaMind/MiniBananaMind-V1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BananaMind/MiniBananaMind-V1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "BananaMind/MiniBananaMind-V1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BananaMind/MiniBananaMind-V1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use BananaMind/MiniBananaMind-V1 with Docker Model Runner:
docker model run hf.co/BananaMind/MiniBananaMind-V1
MiniBananaMind-V1
MiniBananaMind-V1 is a compact LLaMA-style causal language model from BananaMind. It is trained from scratch for next-token text generation on streamed FineWeb-Edu data and is intended as a small, inspectable base model for experiments, demos, and lightweight research workflows.
This is a base language model, not an instruction-tuned assistant. It is best used for continuation-style generation and experimentation rather than factual question answering or chat.
Model Details
- Developer: BananaMind
- Model type: LLaMA-style causal language model
- Library: Transformers
- Task: Text generation
- Training data: FineWeb-Edu
- Checkpoint: MiniBananaMind-V1 uploaded training checkpoint
- License: Apache 2.0
Architecture
| Setting | Value |
|---|---|
| Layers | 6 |
| Hidden size | 256 |
| Attention heads | 8 |
| KV heads | 8 |
| Intermediate size | 768 |
| Context length | 512 tokens |
| Vocabulary size | 32,000 |
| Parameters | ~21.5M |
| Precision | float32 checkpoint |
Intended Use
MiniBananaMind-V1 is suitable for:
- Small-scale language-model experiments
- Educational demos of decoder-only generation
- Testing tokenization, generation settings, and inference pipelines
- Research prototypes where a very small causal LM is useful
It is not recommended for production assistants, safety-critical use, or tasks that require reliable factual knowledge.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
repo_id = "BananaMind/MiniBananaMind-V1"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
torch_dtype=torch.float32,
device_map="auto",
)
prompt = "A computer is a machine that"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=64,
do_sample=True,
temperature=0.2,
top_p=0.9,
repetition_penalty=1.1,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Generation Notes
Because this is a small base model, output quality depends heavily on prompt
style and sampling settings. A temperature of 0.2 is recommended for more
stable continuations. For more varied text, increase temperature or top_p.
Limitations
- The model may hallucinate facts, names, citations, and dates.
- It has not been instruction tuned or aligned for chat behavior.
- It may reproduce biases or unsafe patterns present in web-scale training data.
- The short 512-token context length limits long-document use.
- Small model size means weaker reasoning and factual recall than larger LMs.
Training Data
MiniBananaMind-V1 was trained on streamed FineWeb-Edu text. FineWeb-Edu is a large educational-quality web corpus, so users should expect broad web-language coverage as well as the usual limitations of internet-scale data.
Training data attribution: this model was trained on FineWeb-Edu, a dataset released by Hugging Face as part of the FineWeb family.
Citation
If you use this model in a project, cite the Hugging Face repository and attribute the FineWeb-Edu training data:
@misc{minibananamindv1,
title = {MiniBananaMind-V1},
author = {BananaMind},
year = {2026},
howpublished = {\url{https://huggingface.co/BananaMind/MiniBananaMind-V1}}
}
Dataset: HuggingFaceFW/fineweb-edu
- Downloads last month
- 54
