AudioGen Medium (MLX)

This is the MLX-native port of facebook/audiogen-medium, a 1.5B parameter autoregressive transformer for text-to-audio generation.

Model Details

Architecture: Autoregressive Transformer LM over EnCodec discrete tokens
Parameters: ~1.5B (LM) + EnCodec compression model
Sampling rate: 16 kHz
Frame rate: 50 Hz (4 codebooks, delayed pattern)
Text encoder: T5-small (loaded separately)
Max duration: 10 seconds (configurable)

Files

config.json — Model configuration
model.safetensors — LM + EnCodec weights
model.safetensors.index.json — Weight index (for sharded variants)
tokenizer.json / tokenizer_config.json — T5 tokenizer files

Usage (Swift/MLX)

import MLXAudioGen

let model = try await AudioGenModel.fromPretrained(
    modelFolder: modelURL,
    t5Folder: t5URL
)
let audio = try await model.generateAudio(
    description: "dog barking",
    duration: 5.0,
    cfgCoef: 3.0,
    temperature: 1.0,
    topK: 250
)

License

This model is published under the CC-BY-NC 4.0 license (non-commercial use only), following the original AudioGen license.

Downloads last month: 32

Safetensors

Model size

2B params

Tensor type

F32

F16

MLX

Hardware compatibility

Quantized

Model tree for mlx-community/audiogen-medium-mlx

Base model

facebook/audiogen-medium

Finetuned

(1)

this model