distilgpt2 (MLX)

Full-precision (bfloat16) MLX conversion of distilbert/distilgpt2, produced with mlx-lm.

For Apple Silicon. Runs in mlx-lm, oMLX, or any MLX app.

This is a base language model (text continuation), not instruction-tuned. Prompt it with the start of a passage and sample with a non-zero temperature; greedy decoding on a question-style prompt tends to collapse into whitespace.

Usage

pip install mlx-lm

from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tokenizer = load("mlx-community/distilgpt2")
sampler = make_sampler(temp=0.7)
print(generate(model, tokenizer, prompt="The history of the Roman Empire began when",
               max_tokens=80, sampler=sampler))

Or from the command line:

mlx_lm.generate --model mlx-community/distilgpt2 \
  --prompt "The history of the Roman Empire began when" --max-tokens 80 --temp 0.7

Refer to the original model card for architecture, training data, and intended use.

Conversion check

Smoke-tested after conversion with a continuation prompt: coherent output, ~1700 tok/s generation, peak 0.18 GB on a Macbook Pro M5 Max 128GB 40 GPU.

Downloads last month: 43

Safetensors

Model size

81.9M params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for mlx-community/distilgpt2

Base model

distilbert/distilgpt2

Finetuned

(1517)

this model