Dichter & Denker
A small (42M-parameter) GPT trained from scratch on the German classics and fine-tuned into a chat model you can talk to as Goethe, Schiller, Kant, Lessing, Kleist, Hölderlin, Novalis or Herder.
No pretrained base, no distillation — the corpus, tokenizer and model are all built from raw public-domain text. Full training pipeline and code: 👉 https://github.com/mtkl6/goethe-schiller-kant
Du: Was ist die Pflicht des Menschen?
Kant: Wenn wir uns nun selbst in der Welt verachtend verhalten müssen:
was können wir tun?
Du: Woher kommt die wahre Kunst?
Goethe: Ich bin ein großer Mann, aber ich kenne den schönen Alten; sie sind
von jeher mit erzogen worden …
⚠️ The inference widget is disabled because this is a custom (non-
transformers) architecture — load it with the snippet below.
Files
| File | What |
|---|---|
chat_model.pt |
the fine-tuned chat checkpoint (state dict + config + tokenizer ref) |
tokenizer.json |
the custom 8k German BPE tokenizer |
model.py |
the model definition (needed to load the checkpoint) |
config.json |
architecture metadata |
Usage
The model is a custom nanoGPT-style transformer, so you load it with the
model.py from this repo (not transformers):
import torch
from huggingface_hub import hf_hub_download
from tokenizers import Tokenizer
from model import model_from_checkpoint, get_device # model.py from this repo
REPO = "mtkl6/dichter-denker"
device = get_device()
ckpt = torch.load(hf_hub_download(REPO, "chat_model.pt"),
map_location=device, weights_only=True)
tok = Tokenizer.from_file(hf_hub_download(REPO, "tokenizer.json"))
model = model_from_checkpoint(ckpt, device).eval()
stop = {tok.encode(t).ids[0] for t in ("<|endoftext|>", "<|user|>", "<|bot|>")}
prompt = "<|goethe|><|user|>Was ist die Liebe?<|bot|>"
ids = tok.encode(prompt).ids
out = model.generate(torch.tensor([ids], device=device), 150,
temperature=0.8, top_k=50, stop_tokens=stop,
repetition_penalty=1.15)
print(tok.decode(out[0][len(ids):].tolist()))
For the interactive REPL (persona menu, /temp, /reset), clone the GitHub repo
and run python chat.py with these files in the project root.
Prompt format
<|persona|><|user|>…<|bot|>…<|endoftext|>
Persona tokens: <|goethe|> <|schiller|> <|kant|> <|hoelderlin|>
<|kleist|> <|lessing|> <|novalis|> <|herder|>.
Training
| Architecture | decoder-only transformer, 12 layers / 8 heads / 512 dim, 512 ctx |
| Vocabulary | 8,192 custom German byte-level BPE |
| Pretrain | 40k steps, AdamW, lr 3e-4→3e-5 cosine — train 2.74 / val 4.21 |
| Fine-tune | 3k steps, lr 5e-5, 20% prose mix — chat-val 3.32 |
| Corpus | ~13M tokens, 9 authors from Projekt Gutenberg-DE |
| Hardware | Apple Silicon (MPS) |
Limitations
This is a tiny model trained on ~13M tokens. It answers in fluent, period-flavoured German and captures each author's register, but it does not understand questions — most fine-tuning data is drama dialogue, so it free-associates in-style more than it answers on-topic. It is factually empty and works best on short exchanges phrased in classic style. Treat it as literary style transfer, not a knowledgeable assistant.
Citation
@software{dichter_denker_2026,
author = {Moritz (mtkl6)},
title = {Dichter & Denker: a from-scratch German classics persona chat model},
year = {2026},
url = {https://github.com/mtkl6/goethe-schiller-kant}
}
License
Code & weights: MIT. Source texts: public domain, via Projekt Gutenberg-DE.
- Downloads last month
- 2