Instructions to use Corbenic/Galahad-0.5B-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Corbenic/Galahad-0.5B-base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Corbenic/Galahad-0.5B-base", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Corbenic/Galahad-0.5B-base", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Corbenic/Galahad-0.5B-base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Corbenic/Galahad-0.5B-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Corbenic/Galahad-0.5B-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Corbenic/Galahad-0.5B-base

SGLang

How to use Corbenic/Galahad-0.5B-base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Corbenic/Galahad-0.5B-base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Corbenic/Galahad-0.5B-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Corbenic/Galahad-0.5B-base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Corbenic/Galahad-0.5B-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Corbenic/Galahad-0.5B-base with Docker Model Runner:
```
docker model run hf.co/Corbenic/Galahad-0.5B-base
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Galahad-0.5B (base)

An open ~570M-parameter language model, pretrained for €600 (US$676.95). Released by Corbenic AI. This is the base (completion) model — the original pretrained weights, released as the open result of Corbenic's data + training stack.

What this is — and isn't. Galahad-0.5B is competent for its size, not a leaderboard winner. It is a small base LM: it completes text, it is not instruction-tuned, and it loses to comparable open baselines on standard benchmarks. We say so plainly. The point of Galahad is that it is cheap, open, and losslessly reusable — the substrate on which we demonstrate the Taliesin memory engine (see below). The capability we care about lives in the engine, not in these weights.

Quick facts


Parameters	~570M
Architecture	decoder-only · hidden 1024 · 30 layers · 16 heads · head_dim 64 · SwiGLU · RMSNorm · vocab 65,536 · tied embeddings
Extra norms (v10)	per-head q/k/v norm + gate/up norm (not in stock Llama — needs `trust_remote_code=True`)
Positional	RoPE = interleaved / GPT-J convention (see warning below)
Attention	sliding-window, 1024 tokens (hard window every layer)
Pretraining cost	€600 (US$676.95) — full bill
License	Apache-2.0
Type	base / completion (NOT instruction-tuned)

⚠️ RoPE convention (read before porting)

Galahad uses the interleaved (GPT-J) RoPE convention: rotate_half swaps adjacent pairs and cos/sin are repeat_interleave(2). The included modeling_galahad.py already does this. If you convert to llama.cpp / GGUF, use LLAMA_ROPE_TYPE_NORM (mode 0), not NEOX — the wrong convention passes short prompts and silently collapses on long ones.

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

repo = "Corbenic/Galahad-0.5B-base"
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True,
                                             torch_dtype=torch.bfloat16).eval()

ids = tok("The history of the printing press began", return_tensors="pt").input_ids
out = model.generate(ids, max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=True))

This is a base model — it continues text, it does not follow instructions. Note the 1024-token sliding window: inputs longer than 1024 tokens are attended only within the most recent 1024.

Benchmarks (honest)

Measured on the released interleaved weights:

Benchmark	Galahad-0.5B base
enwik8 (BPB / token-PPL)	1.10 / 7.33
text8 (BPB)	1.276
LAMBADA (acc / acc-norm)	13.96% / 29.49%
BLiMP (macro)	0.722

It loses to same-class baselines (e.g. Transformer-XL, Pythia). That is expected and fine — the value proposition is cheap + open + losslessly reusable, not best-in-class accuracy.

Training integrity note

During development we found and fixed a RoPE-convention bug ourselves (a half-split port where the trained convention is interleaved). On a held-out check it moved enwik8 from 2.54 → 1.10 BPB and BLiMP 0.587 → 0.722. We mention it because it shows the numbers above are the corrected forward pass, not a lucky configuration.

Training data

Galahad's pretraining corpus was fully deduplicated with Merlin, Corbenic's byte-exact deduplication engine. Removing duplicated training data is an established way to improve a model's quality-per-token and reduce redundant training compute (Lee et al., 2021, Deduplicating Training Data Makes Language Models Better). We state this as a method fact — we do not claim a head-to-head win over models trained on non-deduplicated data, as no controlled comparison exists.

Taliesin — the memory engine (NOT in this repo)

Galahad exists to demonstrate Taliesin, Corbenic's external memory engine. A model's internal context state can be saved and restored so the same context is not recomputed every time.

Be precise about what is what:

The foundation — byte-exact, reproducible KV state — is a property of a deterministic engine, and is verifiable with public tooling (standard llama.cpp llama_state_seq_save_file / load_file under GGML_DETERMINISTIC=1). We prove it with public tools on purpose, so you can check the foundation without any of our software. Example receipt: a KV state written to disk by one process, reloaded by a separate fresh process, produces logits byte-identical (SHA-256) to a from-scratch computation. A volatile prompt cache (vLLM / hosted prompt caches) cannot survive a process death; this does.
Taliesin is the memory system built on that foundation — and it does what a whole-sequence snapshot cannot: content-addressed cross-context grafting (splice a stored span into a different context/position), composition of independent spans, deduplication (see Merlin), one engine across vendors, and tiered storage — with the resulting speedup. That is the proprietary part.

Taliesin is not distributed in any form. Its central property — exact, verifiable losslessness — is checkable from the published receipts — no Corbenic software needed to verify the core claim. The engine itself stays closed.

Receipts

Published SHA-256 receipts for the byte-exact / cross-vendor / persistence results live in the launch evidence dataset: https://huggingface.co/datasets/Corbenic/taliesin-receipts. (Cross-vendor byte-exact reuse was verified on Llama-3.1-8B, Qwen2.5-7B and Mistral-7B; disk-roundtrip persistence on open weights. The bit-exact property is a kernel-determinism property and is model-agnostic.)

Related work

Merlin — Corbenic's byte-exact deduplication / lossless-inference engine, published on arXiv: arXiv:2605.09990 (Schelpe, 2026). Companion empirical analysis: arXiv:2605.09611.

Citation

@misc{corbenic2026galahad,
  title  = {Galahad-0.5B: an open, low-cost language model},
  author = {Corbenic AI},
  year   = {2026},
  note   = {Apache-2.0. https://corbenic.ai}
}

Contact & license

Galahad-0.5B is released under Apache-2.0. Contact: sietse@corbenic.ai · https://corbenic.ai

We do not claim Galahad-0.5B outperforms larger models. It does not. The narrow, verifiable claim is lossless, byte-exact context reuse — demonstrated across multiple vendors' models and on this €600 one.

Downloads last month: 7

Safetensors

Model size

0.6B params

Tensor type

BF16

Papers for Corbenic/Galahad-0.5B-base

Merlin: Deterministic Byte-Exact Deduplication for Lossless Context Optimization in Large Language Model Inference

Paper • 2605.09990 • Published May 11 • 1

Byte-Exact Deduplication in Retrieval-Augmented Generation: A Three-Regime Empirical Analysis Across Public Benchmarks

Paper • 2605.09611 • Published May 10 • 1