Instructions to use BigBlueCeiling/FrndoBrain-1.0.1-24b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use BigBlueCeiling/FrndoBrain-1.0.1-24b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="BigBlueCeiling/FrndoBrain-1.0.1-24b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("BigBlueCeiling/FrndoBrain-1.0.1-24b")
model = AutoModelForImageTextToText.from_pretrained("BigBlueCeiling/FrndoBrain-1.0.1-24b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use BigBlueCeiling/FrndoBrain-1.0.1-24b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "BigBlueCeiling/FrndoBrain-1.0.1-24b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BigBlueCeiling/FrndoBrain-1.0.1-24b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/BigBlueCeiling/FrndoBrain-1.0.1-24b

SGLang

How to use BigBlueCeiling/FrndoBrain-1.0.1-24b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "BigBlueCeiling/FrndoBrain-1.0.1-24b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BigBlueCeiling/FrndoBrain-1.0.1-24b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "BigBlueCeiling/FrndoBrain-1.0.1-24b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BigBlueCeiling/FrndoBrain-1.0.1-24b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use BigBlueCeiling/FrndoBrain-1.0.1-24b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BigBlueCeiling/FrndoBrain-1.0.1-24b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BigBlueCeiling/FrndoBrain-1.0.1-24b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for BigBlueCeiling/FrndoBrain-1.0.1-24b to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="BigBlueCeiling/FrndoBrain-1.0.1-24b",
    max_seq_length=2048,
)

Docker Model Runner
How to use BigBlueCeiling/FrndoBrain-1.0.1-24b with Docker Model Runner:
```
docker model run hf.co/BigBlueCeiling/FrndoBrain-1.0.1-24b
```

FrndoBrain-1.0.1-24b

A LoRA fine-tune of Mistral Small 3.2 24B Instruct (2506), merged back into a full BF16 HuggingFace checkpoint and shipped as a standard safetensors directory.

For any questions about this model, contact eoffermann@gmail.com.

Overview

This is a 24-billion-parameter multimodal (text + image) instruction-tuned language model, structurally identical to mistralai/Mistral-Small-3.2-24B-Instruct-2506. The only thing that has changed are the language-model attention and MLP weights, which have been adapted toward a specific domain via a low-rank LoRA fine-tune and then merged back into the full BF16 weight tensors. Everything else about the model — its vision tower, multimodal projector, tokenizer, chat template, image processor, and overall architecture — is byte-identical to the base.

Practically, that means:

It's a drop-in replacement for the vanilla base model. Any serving command, prompt template, or inference pipeline that works against the base model works against this directory with no changes other than the model path.
Image input still works exactly as it does on the base. The Pixtral vision encoder and the multimodal projector were frozen during training, so vision behavior is preserved.
Behavior diverges from base only on the kind of conversations the fine-tuning dataset covers. On out-of-domain prompts, output should be close to the base model. On in-domain prompts, the model has shifted toward the training data's style and conventions.

If you're already running vanilla Mistral Small 3.2 24B in your stack, deploying this is a matter of swapping the model path. No tokenizer change, no chat-template change, no extra launch flags.

What's different vs the vanilla base

Layer	Status vs base	Notes
Language-model attention (`q,k,v,o`)	Adapted	LoRA rank 16, alpha 32, then merged back into BF16
Language-model MLP (`gate,up,down`)	Adapted	Same LoRA config
Vision tower (Pixtral encoder)	Unchanged	Frozen during training; bit-identical to base
Multimodal projector	Unchanged	Frozen during training; bit-identical to base
Token embedding table	Unchanged	No new tokens added
LM head	Unchanged	Vocab unchanged
Tokenizer (`tokenizer.json`)	Unchanged	This is the Unsloth HF port — same vocab as Mistral's native `tekken.json`, but in HF format so `AutoTokenizer` works directly
Chat template (`chat_template.jinja`)	Unchanged	Original `mistral_small` template
Image processor	Unchanged	Same `preprocessor_config.json` / `processor_config.json`
Architecture (`config.json`)	Unchanged	`Mistral3ForConditionalGeneration` — same dimensions, layer counts, context length
Weight dtype	Unchanged	BF16 (no on-disk quantization)
License	Unchanged	Apache 2.0 (inherited from base)

In summary: same model in shape and capability, with the language-model weights tilted by the fine-tune. Trainable parameters were 92.4 M out of 24.1 B (0.38%), so the deviation from base is bounded — this is a refinement of style on top of the existing capabilities, not a reshaping of what the model can do.

How the fine-tune was made

Setting	Value
Base checkpoint	`unsloth/Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit` (weight-identical to `mistralai/Mistral-Small-3.2-24B-Instruct-2506`; redistributed by Unsloth with an HF-format tokenizer)
Method	QLoRA (NF4 4-bit base weights, BF16 compute, double-quant enabled)
Trainable adapter	LoRA, rank 16, alpha 32, dropout 0
LoRA targets	`q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` (LM only)
Frozen	Vision tower, multimodal projector, embeddings, LM head
Dataset	~8,000 conversation examples (OpenAI `messages` format) + 1,000-conversation held-out validation set + 1,000-conversation held-out test set
Sequence cutoff	2,048 tokens
Optimizer	AdamW (Torch)
Learning rate	5e-5, cosine schedule, no warmup
Effective batch size	16 (per-device 2 × grad-accum 8)
Epochs trained	3
Selected checkpoint	Epoch 2 — eval loss trajectory was 1.025 → 1.017 → 1.082; epoch 3 was overfit
Final merge	LoRA → BF16 via Unsloth's `save_pretrained_merged(save_method="merged_16bit")`
Training framework	LLaMA-Factory 0.9.4 + Unsloth backend, transformers 4.57.1
Hardware	Single RTX A6000 (48 GB)

Quality caveat (one minor thing worth knowing)

The LoRA → BF16 merge was performed against a 4-bit-quantized base, not against the original BF16 base. This is a memory-driven choice — merging a 24B BF16 model on CPU OOM-kills inside Docker even on hosts with plenty of RAM, and the GPU path through Unsloth requires the 4-bit base.

In practice this introduces a small dequantization artifact in the merged weights (NF4 → BF16 round-trip) that is well below typical quantization noise for any runtime quant scheme you might apply on top (AWQ, GPTQ, FP8, etc.). If you ever need a "perfect" merge — for instance, if you're serving at full BF16 and benchmarking against the base directly — we can redo the merge against the original BF16 weights on GPU and ship a fresh directory. Ask if you need that.

Deployment

Verify the directory loads

# File inventory: should show 10 safetensor shards + the JSON sidecars
ls -lh FrndoBrain-1.0.1-24b/

# Total size should be ~45 GB
du -sh FrndoBrain-1.0.1-24b/

# Smoke test in transformers (loads in ~1 min, no inference yet)
python -c "
from transformers import AutoModelForImageTextToText, AutoProcessor
m = AutoModelForImageTextToText.from_pretrained('FrndoBrain-1.0.1-24b', torch_dtype='bfloat16')
p = AutoProcessor.from_pretrained('FrndoBrain-1.0.1-24b')
print('OK:', type(m).__name__, '|', sum(x.numel() for x in m.parameters())/1e9, 'B params')
"

If transformers loads cleanly, vLLM will too.

Running in vLLM

This is Mistral3ForConditionalGeneration — vLLM auto-detects the architecture from config.json. Use vLLM ≥ 0.8.x; earlier versions don't have the Mistral 3.x multimodal class.

Example launch (adjust for your serving setup):

vllm serve /path/to/FrndoBrain-1.0.1-24b \
  --served-model-name FrndoBrain-1.0.1-24b \
  --dtype bfloat16 \
  --max-model-len 32768 \
  --limit-mm-per-prompt image=4

A few notes:

--tokenizer-mode mistral is not required and should not be passed — this directory ships an HF-format tokenizer.json, not Mistral's native tekken.json. The default tokenizer path is correct.
chat_template.jinja is picked up automatically by vLLM when serving via /v1/chat/completions. No --chat-template flag needed unless you're intentionally overriding it.
The base model's full context is 131,072 tokens; the LoRA was trained at cutoff_len=2048 but the merged model still accepts the full base context. Long-context behavior on this fine-tune has not been benchmarked.
The directory name itself is not referenced anywhere inside the files — rename it freely on your end.

File inventory

File	Purpose
`model-0000{1..10}-of-00010.safetensors`	BF16 weights, 10 shards (~4.5 GB each)
`model.safetensors.index.json`	Shard manifest
`config.json`	Model config (architecture, dimensions, vision config)
`tokenizer.json`	Fast tokenizer, HF format
`tokenizer_config.json`	Tokenizer settings
`special_tokens_map.json`	Special-token IDs
`chat_template.jinja`	`mistral_small` chat template (unchanged from base)
`preprocessor_config.json`	Image preprocessing parameters
`processor_config.json`	Combined processor config

Downloads last month: 23

Safetensors

Model size

24B params

Tensor type

BF16

Model tree for BigBlueCeiling/FrndoBrain-1.0.1-24b

Base model

mistralai/Mistral-Small-3.1-24B-Base-2503

Finetuned

mistralai/Mistral-Small-3.2-24B-Instruct-2506

Finetuned

(63)

this model