dbrx-instruct-4bit / README.md
eek's picture
Update README.me with new instructiosn
f28fc55 verified
metadata
license: other
tags:
  - mlx
extra_gated_heading: You need to share contact information with Databricks to access this model
extra_gated_prompt: >-

  ### DBRX Terms of Use

  Use of DBRX is governed by the [Databricks Open Model
  License](https://www.databricks.com/legal/open-model-license) and the
  [Databricks Open Model Acceptable Use
  Policy](https://www.databricks.com/legal/acceptable-use-policy-open-model).
extra_gated_fields:
  First Name: text
  Last Name: text
  Organization: text
  By clicking 'Submit' below, I accept the terms of the license and acknowledge that the information I provide will be collected, stored, processed, and shared in accordance with Databricks' Privacy Notice and I understand I can update my preferences at any time: checkbox
extra_gated_description: >-
  The information you provide will be collected, stored, processed, and shared
  in accordance with Databricks [Privacy
  Notice](https://www.databricks.com/legal/privacynotice).
extra_gated_button_content: Submit
inference: false
license_name: databricks-open-model-license
license_link: https://www.databricks.com/legal/open-model-license

mlx-community/dbrx-instruct-4bit

This model was converted to MLX format from databricks/dbrx-instruct using mlx-lm version b80adbc after DBRX support was added by Awni Hannun.

Refer to the original model card for more details on the model.

Conversion

Conversion was done with:

python -m mlx_lm.convert --hf-path databricks/dbrx-instruct -q --upload-repo mlx-community/dbrx-instruct-4bit

Use with mlx

Make you you first upgrade mlx-lm and mlx to the latest.

pip install mlx --upgrade
pip install mlx-lm --upgrade

python -m mlx_lm.generate --model mlx-community/dbrx-instruct-4bit --prompt "Hello" --trust-remote-code --use-default-chat-template --max-tokens 500

Remember, this is an Instruct model, so you will need to use the instruct prompt template by appending --use-default-chat-template

Example:

python -m mlx_lm.generate --model dbrx-instruct-4bit --prompt "What's the difference between PCA vs UMAP vs t-SNE?" --trust-remote-code --use-default-chat-template  --max-tokens 1000

Output:

image/png

On my Macbook Pro M2 with 96GB of Unified Memory, DBRX Instruct in 4-bit for the above prompt it eats 70.2GB of RAM.

if the mlx-lm package was updated it can also be installed from pip:

pip install mlx-lm

To use it from Python you can do the following:

from mlx_lm import load, generate

model, tokenizer = load(
   "mlx-community/dbrx-instruct-4bit",
   tokenizer_config={"trust_remote_code": True}
)

chat = [
   {"role": "user", "content": "What's the difference between PCA vs UMAP vs t-SNE?"},
   # We need to add the Assistant role as well, otherwise mlx_lm will error on generation.
   {"role": "assistant", "content": "The "},
]

prompt = tokenizer.apply_chat_template(chat, add_generation_prompt=True, tokenize=False)

response = generate(model, tokenizer, prompt=prompt, verbose=True, temp=0.6, max_tokens=1500)

Converted and uploaded by eek