Instructions to use Corbenic/Galahad-0.5B-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Corbenic/Galahad-0.5B-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Corbenic/Galahad-0.5B-base", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Corbenic/Galahad-0.5B-base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Corbenic/Galahad-0.5B-base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Corbenic/Galahad-0.5B-base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Corbenic/Galahad-0.5B-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Corbenic/Galahad-0.5B-base
- SGLang
How to use Corbenic/Galahad-0.5B-base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Corbenic/Galahad-0.5B-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Corbenic/Galahad-0.5B-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Corbenic/Galahad-0.5B-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Corbenic/Galahad-0.5B-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Corbenic/Galahad-0.5B-base with Docker Model Runner:
docker model run hf.co/Corbenic/Galahad-0.5B-base
Galahad-0.5B (base)
An open ~570M-parameter language model, pretrained for €600 (US$676.95). Released by Corbenic AI. This is the base (completion) model — the original pretrained weights, released as the open result of Corbenic's data + training stack.
What this is — and isn't. Galahad-0.5B is competent for its size, not a leaderboard winner. It is a small base LM: it completes text, it is not instruction-tuned, and it loses to comparable open baselines on standard benchmarks. We say so plainly. The point of Galahad is that it is cheap, open, and losslessly reusable — the substrate on which we demonstrate the Taliesin memory engine (see below). The capability we care about lives in the engine, not in these weights.
Quick facts
| Parameters | ~570M |
| Architecture | decoder-only · hidden 1024 · 30 layers · 16 heads · head_dim 64 · SwiGLU · RMSNorm · vocab 65,536 · tied embeddings |
| Extra norms (v10) | per-head q/k/v norm + gate/up norm (not in stock Llama — needs trust_remote_code=True) |
| Positional | RoPE = interleaved / GPT-J convention (see warning below) |
| Attention | sliding-window, 1024 tokens (hard window every layer) |
| Pretraining cost | €600 (US$676.95) — full bill |
| License | Apache-2.0 |
| Type | base / completion (NOT instruction-tuned) |
⚠️ RoPE convention (read before porting)
Galahad uses the interleaved (GPT-J) RoPE convention: rotate_half swaps adjacent pairs and
cos/sin are repeat_interleave(2). The included modeling_galahad.py already does this. If you
convert to llama.cpp / GGUF, use LLAMA_ROPE_TYPE_NORM (mode 0), not NEOX — the wrong convention
passes short prompts and silently collapses on long ones.
How to use
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
repo = "Corbenic/Galahad-0.5B-base"
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True,
torch_dtype=torch.bfloat16).eval()
ids = tok("The history of the printing press began", return_tensors="pt").input_ids
out = model.generate(ids, max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=True))
This is a base model — it continues text, it does not follow instructions. Note the 1024-token sliding window: inputs longer than 1024 tokens are attended only within the most recent 1024.
Benchmarks (honest)
Measured on the released interleaved weights:
| Benchmark | Galahad-0.5B base |
|---|---|
| enwik8 (BPB / token-PPL) | 1.10 / 7.33 |
| text8 (BPB) | 1.276 |
| LAMBADA (acc / acc-norm) | 13.96% / 29.49% |
| BLiMP (macro) | 0.722 |
It loses to same-class baselines (e.g. Transformer-XL, Pythia). That is expected and fine — the value proposition is cheap + open + losslessly reusable, not best-in-class accuracy.
Training integrity note
During development we found and fixed a RoPE-convention bug ourselves (a half-split port where the trained convention is interleaved). On a held-out check it moved enwik8 from 2.54 → 1.10 BPB and BLiMP 0.587 → 0.722. We mention it because it shows the numbers above are the corrected forward pass, not a lucky configuration.
Training data
Galahad's pretraining corpus was fully deduplicated with Merlin, Corbenic's byte-exact deduplication engine. Removing duplicated training data is an established way to improve a model's quality-per-token and reduce redundant training compute (Lee et al., 2021, Deduplicating Training Data Makes Language Models Better). We state this as a method fact — we do not claim a head-to-head win over models trained on non-deduplicated data, as no controlled comparison exists.
Taliesin — the memory engine (NOT in this repo)
Galahad exists to demonstrate Taliesin, Corbenic's external memory engine. A model's internal context state can be saved and restored so the same context is not recomputed every time.
Be precise about what is what:
- The foundation — byte-exact, reproducible KV state — is a property of a deterministic engine, and
is verifiable with public tooling (standard
llama.cppllama_state_seq_save_file/load_fileunderGGML_DETERMINISTIC=1). We prove it with public tools on purpose, so you can check the foundation without any of our software. Example receipt: a KV state written to disk by one process, reloaded by a separate fresh process, produces logits byte-identical (SHA-256) to a from-scratch computation. A volatile prompt cache (vLLM / hosted prompt caches) cannot survive a process death; this does. - Taliesin is the memory system built on that foundation — and it does what a whole-sequence snapshot cannot: content-addressed cross-context grafting (splice a stored span into a different context/position), composition of independent spans, deduplication (see Merlin), one engine across vendors, and tiered storage — with the resulting speedup. That is the proprietary part.
Taliesin is not distributed in any form. Its central property — exact, verifiable losslessness — is checkable from the published receipts — no Corbenic software needed to verify the core claim. The engine itself stays closed.
Receipts
Published SHA-256 receipts for the byte-exact / cross-vendor / persistence results live in the launch evidence dataset: https://huggingface.co/datasets/Corbenic/taliesin-receipts. (Cross-vendor byte-exact reuse was verified on Llama-3.1-8B, Qwen2.5-7B and Mistral-7B; disk-roundtrip persistence on open weights. The bit-exact property is a kernel-determinism property and is model-agnostic.)
Related work
- Merlin — Corbenic's byte-exact deduplication / lossless-inference engine, published on arXiv: arXiv:2605.09990 (Schelpe, 2026). Companion empirical analysis: arXiv:2605.09611.
Citation
@misc{corbenic2026galahad,
title = {Galahad-0.5B: an open, low-cost language model},
author = {Corbenic AI},
year = {2026},
note = {Apache-2.0. https://corbenic.ai}
}
Contact & license
Galahad-0.5B is released under Apache-2.0. Contact: sietse@corbenic.ai · https://corbenic.ai
We do not claim Galahad-0.5B outperforms larger models. It does not. The narrow, verifiable claim is lossless, byte-exact context reuse — demonstrated across multiple vendors' models and on this €600 one.
- Downloads last month
- 7