Instructions to use Thorstin/gpt2-dutch-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Thorstin/gpt2-dutch-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Thorstin/gpt2-dutch-instruct")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Thorstin/gpt2-dutch-instruct")
model = AutoModelForMultimodalLM.from_pretrained("Thorstin/gpt2-dutch-instruct")

llama-cpp-python

How to use Thorstin/gpt2-dutch-instruct with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Thorstin/gpt2-dutch-instruct",
	filename="dutch-gpt2-f16.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Thorstin/gpt2-dutch-instruct with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Thorstin/gpt2-dutch-instruct:F16
# Run inference directly in the terminal:
llama-cli -hf Thorstin/gpt2-dutch-instruct:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Thorstin/gpt2-dutch-instruct:F16
# Run inference directly in the terminal:
llama-cli -hf Thorstin/gpt2-dutch-instruct:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Thorstin/gpt2-dutch-instruct:F16
# Run inference directly in the terminal:
./llama-cli -hf Thorstin/gpt2-dutch-instruct:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Thorstin/gpt2-dutch-instruct:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Thorstin/gpt2-dutch-instruct:F16

Use Docker

docker model run hf.co/Thorstin/gpt2-dutch-instruct:F16

LM Studio
Jan

vLLM

How to use Thorstin/gpt2-dutch-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Thorstin/gpt2-dutch-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Thorstin/gpt2-dutch-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Thorstin/gpt2-dutch-instruct:F16

SGLang

How to use Thorstin/gpt2-dutch-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Thorstin/gpt2-dutch-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Thorstin/gpt2-dutch-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Thorstin/gpt2-dutch-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Thorstin/gpt2-dutch-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use Thorstin/gpt2-dutch-instruct with Ollama:
```
ollama run hf.co/Thorstin/gpt2-dutch-instruct:F16
```

Unsloth Studio

How to use Thorstin/gpt2-dutch-instruct with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Thorstin/gpt2-dutch-instruct to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Thorstin/gpt2-dutch-instruct to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Thorstin/gpt2-dutch-instruct to start chatting

Atomic Chat new
Docker Model Runner
How to use Thorstin/gpt2-dutch-instruct with Docker Model Runner:
```
docker model run hf.co/Thorstin/gpt2-dutch-instruct:F16
```

Lemonade

How to use Thorstin/gpt2-dutch-instruct with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Thorstin/gpt2-dutch-instruct:F16

Run and chat with the model

lemonade run user.gpt2-dutch-instruct-F16

List all available models

lemonade list

gpt2-dutch-instruct

A GPT-2 small (124M parameter) language model trained from scratch on Dutch text, then fine-tuned for instruction following using supervised fine-tuning (SFT). This model understands and generates Dutch.

Model details

Property	Value
Architecture	GPT-2 small
Parameters	123.8M
Layers	12
Attention heads	12
Hidden dimension	768
Context length	512 tokens
Vocabulary size	50,000 (Dutch BPE)
Weights	fp16 / safetensors (473 MB)
Inference speed (CPU)	0.9 tok/s

Files

File	Format	Size
`model.safetensors`	fp16	473 MB
`dutch-gpt2-f16.gguf`	GGUF F16	249 MB
`dutch-gpt2-q8_0.gguf`	GGUF Q8_0	132 MB

Use with llama.cpp

# Download
wget https://huggingface.co/Thorstin/gpt2-dutch-instruct/resolve/main/dutch-gpt2-q8_0.gguf

# Run
llama-cli -m dutch-gpt2-q8_0.gguf \
  -p "### Instructie:\nWat is de hoofdstad van Nederland?\n### Antwoord:\n" \
  -n 200

Use with Ollama

# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./dutch-gpt2-q8_0.gguf
TEMPLATE """### Instructie:
{{ .Prompt }}
### Antwoord:
"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.3
PARAMETER num_ctx 512
EOF

ollama create dutch-gpt2 -f Modelfile
ollama run dutch-gpt2

Training

Phase 1 — Pretraining from scratch

Dataset: CC-100 Dutch (~37 GB raw, ~6.6B tokens), streamed
Tokenizer: ByteLevel BPE trained on first 500K CC-100 Dutch documents
Hardware: NVIDIA Tesla T4 (16 GB VRAM)
Tokens trained: ~5B
Steps: 154,000
Final loss: 3.54
Duration: ~70 GPU hours
Key settings: fp16=True, gradient_checkpointing=True, batch_size=32, lr=5e-4, cosine scheduler

Phase 2 — Instruction fine-tuning (SFT)

Dataset: BramVanroy/alpaca-cleaned-dutch — 46,163 Dutch instruction/response pairs
Framework: TRL 1.6.0 SFTTrainer
Epochs: 3
Steps: 4,329
Loss: 3.31 → 1.14
Duration: ~1.25 hours

Instruction format

### Instructie:
<vraag of instructie>
### Antwoord:
<antwoord>

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Thorstin/gpt2-dutch-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16)
model.eval()

def chat(instruction: str, max_new_tokens: int = 200) -> str:
    prompt = f"### Instructie:\n{instruction}\n### Antwoord:\n"
    inputs = tokenizer(prompt, return_tensors="pt")
    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            repetition_penalty=1.3,
            pad_token_id=tokenizer.eos_token_id,
        )
    response = tokenizer.decode(output[0], skip_special_tokens=True)
    return response.split("### Antwoord:")[-1].strip()

print(chat("Wat is de hoofdstad van Nederland?"))

Benchmark results (lm-evaluation-harness, limit=200)

Task	Accuracy	Accuracy (norm)
hellaswag_nl	24.50%	28.50%
arc_nl	19.00%	29.00%
blimp_nl	80.67%	79.51%

Random baseline: 50% for BLiMP-NL (binary), 25% for HellaSwag/ARC (4-way).

Sample outputs

Prompt	Response
Wat is de hoofdstad van Nederland?	De hoofdstad van Nederland is Amsterdam....
Leg uit wat fotosynthese is.	Fotosynthese is het proces waarbij planten lichtenergie van de zon omzetten in chemische energie die ze gebruiken om koo...
Schrijf een kort gedicht over de zee.	De golven slaan tegen het raam, Een kalmerende bries draagt de geur van zout en vers gezette koffie. Het geluid van gebr...

Limitations

124M parameters is a hard ceiling — expect occasional repetition, factual errors, and shorter coherent responses compared to larger models
Context window is limited to 512 tokens

Framework versions

Package	Version
TRL	1.6.0
Transformers	4.48
PyTorch	2.9.1+cu128
Datasets	2.16
Tokenizers	0.21

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32