bf16_vs_fp8 / docs /mlx_integration.md
zjasper666's picture
Upload folder using huggingface_hub
8655a4b verified
|
raw
history blame
828 Bytes

Apple MLX Integration

You can use Apple MLX as an optimized worker implementation in FastChat.

It runs models efficiently on Apple Silicon

See the supported models here.

Note that for Apple Silicon Macs with less memory, smaller models (or quantized models) are recommended.

Instructions

  1. Install MLX.

    pip install "mlx-lm>=0.0.6"
    
  2. When you launch a model worker, replace the normal worker (fastchat.serve.model_worker) with the MLX worker (fastchat.serve.mlx_worker). Remember to launch a model worker after you have launched the controller (instructions)

    python3 -m fastchat.serve.mlx_worker --model-path TinyLlama/TinyLlama-1.1B-Chat-v1.0