distilgpt2 (MLX)

Full-precision (bfloat16) MLX conversion of distilbert/distilgpt2, produced with mlx-lm.

For Apple Silicon. Runs in mlx-lm, oMLX, or any MLX app.

This is a base language model (text continuation), not instruction-tuned. Prompt it with the start of a passage and sample with a non-zero temperature; greedy decoding on a question-style prompt tends to collapse into whitespace.

Usage

pip install mlx-lm
from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load("mlx-community/distilgpt2")
sampler = make_sampler(temp=0.7)
print(generate(model, tokenizer, prompt="The history of the Roman Empire began when",
               max_tokens=80, sampler=sampler))

Or from the command line:

mlx_lm.generate --model mlx-community/distilgpt2 \
  --prompt "The history of the Roman Empire began when" --max-tokens 80 --temp 0.7

Refer to the original model card for architecture, training data, and intended use.

Conversion check

Smoke-tested after conversion with a continuation prompt: coherent output, ~1700 tok/s generation, peak 0.18 GB on a Macbook Pro M5 Max 128GB 40 GPU.

Downloads last month
43
Safetensors
Model size
81.9M params
Tensor type
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/distilgpt2

Finetuned
(1517)
this model