databricks/dbrx-instruct · running model hangs on macOS M1 Sonoma

Mar 27, 2024

model generation never completes on M1 mac running Sonoma OS

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

print('Loading model...')
tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", device_map="cpu", torch_dtype=torch.bfloat16, trust_remote_code=True)

print('Generating input prompts...')
input_text = "What does it take to build a great LLM?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt")

print('Running model...') # <- gets here
outputs = model.generate(**input_ids, max_new_tokens=200, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0]))

engiai changed discussion title from running model hands on macOS M1 Sonoma to running model hangs on macOS M1 Sonoma Mar 27, 2024

srowen

Databricks org Mar 27, 2024

•

edited Mar 27, 2024

You're running on CPU. It's going to take a very long time without a GPU so it's likely just still running. This would be the case for any big LLM, and this is quite big.

You can check out others 4-bit quantizations that might work a lot better on macs

engiai

Mar 27, 2024

Affirmative, thank you!

engiai changed discussion status to closed Mar 27, 2024

Williams07

Mar 28, 2024

You're running on CPU. It's going to take a very long time without a GPU so it's likely just still running. This would be the case for any big LLM, and this is quite big.

You can check out others 4-bit quantizations that might work a lot better on macs

Can I use bitsandbytes to load 8bit model of dbrx?