πŸ”₯ MoE-Mixtral-7B-8Expert

mixtral-8x7b is a Mixture-of-Expert (MoE) model. LLaMA2-Accessory has supported its inference and finetuning.

πŸš€ Features

With LLaMA2-Accessory, mixtral-8x7b enjoys the following features:

  1. Distributed MoE (namely instantiating experts on multiple processes/gpus)
  2. Load Balancing Loss
  3. Tensor Parallel and FSDP for efficiently training
  4. Distributed and/or quantized inference

πŸ”₯ Online Demo

We host a web demo πŸ’»here, which shows a mixtral-8x7b model finetuned on evol-codealpaca-v1 and ultrachat_200k, with LoRA and Bias tuning.

πŸ’‘ Tutorial

A detailed tutorial is available at our document

