Dichter & Denker

A small (42M-parameter) GPT trained from scratch on the German classics and fine-tuned into a chat model you can talk to as Goethe, Schiller, Kant, Lessing, Kleist, Hölderlin, Novalis or Herder.

No pretrained base, no distillation — the corpus, tokenizer and model are all built from raw public-domain text. Full training pipeline and code: 👉 https://github.com/mtkl6/goethe-schiller-kant

Du:    Was ist die Pflicht des Menschen?
Kant:  Wenn wir uns nun selbst in der Welt verachtend verhalten müssen:
       was können wir tun?

Du:    Woher kommt die wahre Kunst?
Goethe: Ich bin ein großer Mann, aber ich kenne den schönen Alten; sie sind
        von jeher mit erzogen worden …

⚠️ The inference widget is disabled because this is a custom (non-transformers) architecture — load it with the snippet below.

Files

File	What
`chat_model.pt`	the fine-tuned chat checkpoint (state dict + config + tokenizer ref)
`tokenizer.json`	the custom 8k German BPE tokenizer
`model.py`	the model definition (needed to load the checkpoint)
`config.json`	architecture metadata

Usage

The model is a custom nanoGPT-style transformer, so you load it with the model.py from this repo (not transformers):

import torch
from huggingface_hub import hf_hub_download
from tokenizers import Tokenizer
from model import model_from_checkpoint, get_device  # model.py from this repo

REPO = "mtkl6/dichter-denker"
device = get_device()

ckpt = torch.load(hf_hub_download(REPO, "chat_model.pt"),
                  map_location=device, weights_only=True)
tok = Tokenizer.from_file(hf_hub_download(REPO, "tokenizer.json"))
model = model_from_checkpoint(ckpt, device).eval()

stop = {tok.encode(t).ids[0] for t in ("<|endoftext|>", "<|user|>", "<|bot|>")}
prompt = "<|goethe|><|user|>Was ist die Liebe?<|bot|>"
ids = tok.encode(prompt).ids
out = model.generate(torch.tensor([ids], device=device), 150,
                     temperature=0.8, top_k=50, stop_tokens=stop,
                     repetition_penalty=1.15)
print(tok.decode(out[0][len(ids):].tolist()))

For the interactive REPL (persona menu, /temp, /reset), clone the GitHub repo and run python chat.py with these files in the project root.

Prompt format

<|persona|><|user|>…<|bot|>…<|endoftext|>

Persona tokens: <|goethe|> <|schiller|> <|kant|> <|hoelderlin|> <|kleist|> <|lessing|> <|novalis|> <|herder|>.

Training


Architecture	decoder-only transformer, 12 layers / 8 heads / 512 dim, 512 ctx
Vocabulary	8,192 custom German byte-level BPE
Pretrain	40k steps, AdamW, lr 3e-4→3e-5 cosine — train 2.74 / val 4.21
Fine-tune	3k steps, lr 5e-5, 20% prose mix — chat-val 3.32
Corpus	~13M tokens, 9 authors from Projekt Gutenberg-DE
Hardware	Apple Silicon (MPS)

Limitations

This is a tiny model trained on ~13M tokens. It answers in fluent, period-flavoured German and captures each author's register, but it does not understand questions — most fine-tuning data is drama dialogue, so it free-associates in-style more than it answers on-topic. It is factually empty and works best on short exchanges phrased in classic style. Treat it as literary style transfer, not a knowledgeable assistant.

Citation

@software{dichter_denker_2026,
  author = {Moritz (mtkl6)},
  title  = {Dichter & Denker: a from-scratch German classics persona chat model},
  year   = {2026},
  url    = {https://github.com/mtkl6/goethe-schiller-kant}
}

License

Code & weights: MIT. Source texts: public domain, via Projekt Gutenberg-DE.

Downloads last month: 2