Qlora adapters for Mixtral-8x7B-v0.1-hf-4bit-mlx
fine-tuned on guanaco dataset
inference vis mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Mixtral-8x7B-v0.1-hf-4bit-mlx",adapter_file="adapters.npz")
generate(model=model, tokenizer=tokenizer, prompt="### Human: write a quick sort in python.\n### Assistant: ", max_tokens=500, verbose=True,temp=0.3)
serve as an API Service
pip install mlx-llm-server
mlx-llm-server --model-path mlx-community/Mixtral-8x7B-v0.1-hf-4bit-mlx --adapter-file adapters.npz