AudioGen Medium (MLX)

This is the MLX-native port of facebook/audiogen-medium, a 1.5B parameter autoregressive transformer for text-to-audio generation.

Model Details

  • Architecture: Autoregressive Transformer LM over EnCodec discrete tokens
  • Parameters: ~1.5B (LM) + EnCodec compression model
  • Sampling rate: 16 kHz
  • Frame rate: 50 Hz (4 codebooks, delayed pattern)
  • Text encoder: T5-small (loaded separately)
  • Max duration: 10 seconds (configurable)

Files

  • config.json โ€” Model configuration
  • model.safetensors โ€” LM + EnCodec weights
  • model.safetensors.index.json โ€” Weight index (for sharded variants)
  • tokenizer.json / tokenizer_config.json โ€” T5 tokenizer files

Usage (Swift/MLX)

import MLXAudioGen

let model = try await AudioGenModel.fromPretrained(
    modelFolder: modelURL,
    t5Folder: t5URL
)
let audio = try await model.generateAudio(
    description: "dog barking",
    duration: 5.0,
    cfgCoef: 3.0,
    temperature: 1.0,
    topK: 250
)

License

This model is published under the CC-BY-NC 4.0 license (non-commercial use only), following the original AudioGen license.

Downloads last month
32
Safetensors
Model size
2B params
Tensor type
F32
ยท
F16
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mlx-community/audiogen-medium-mlx

Finetuned
(1)
this model