Does this model work? It might be broken.

#1
by ghogan42 - opened

I haven't used mlx-lm before and this was the first model I downloaded. I couldn't get any intelligble response from this model. Just garbage. Although I may just be doing something wrong.

However, changing the model to Meta-Llama-8B-Instruct-4bit works fine. Only this 70B has a problem.

I've tried various prompt templates and so on without good results, so I'm asking here to see if this model is working for anyone else.

Here is an example command that works properly with Meta-Llama-8B-Instruct-4bit and fails to generate properly with Meta-Llama-70B-Instruct-4bit . It should not be a problem with my machine since I have 128GB ram to run these models.

python -m mlx_lm.generate --model mlx-community/Meta-Llama-3-8B-Instruct-4bit --prompt "<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant that performs tasks and answers questions.<|eot_id|><|start_header_id|>user<|end_header_id|>

Write a short story about poodle named Rex.<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --temp 1.0 --ignore-chat-template

EDIT: Looks like there is some kind of bug. There is a github issue regarding this now. https://github.com/ml-explore/mlx-examples/issues/692

MLX Community org

Same issue here

MLX Community org

Hey @ghogan42 and @ivanfioravanti

Thanks a lot for flagging this issue!

I will update the model weights as soon as this issue is fixed :)

same issue 2

MLX Community org

Please provide reproducible example

MLX Community org

Make sure you redownload the new model weights because Huggingface usually caches it locally

code: demo.py

from mlx_lm import load, generate

# myenv: mlx-lm-310
# https://huggingface.co/mlx-community/Meta-Llama-3-70B-Instruct-4bit
# 2024.04.22 download 

model_path = "/Users/janewu/llm-model/llama/mlx/Meta-Llama-3-70B-Instruct-4bit"
model, tokenizer = load(model_path)
prompt = "hello"
response = generate(model, tokenizer, prompt=prompt, verbose=True, max_tokens=512) # temp=0.0

output:
```
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Prompt: hello
Question
assistant<|end_header_id|>

Here is a list of popular Python libraries for data science:

  1. TensorFlow: TensorFlow is an open-source software library for machine learning and data analysis other tasks. as<|eot_id|><|start_header_id|>assistant other tasks.<|eot_id|><|start_header_id|><|eot_id|> an other other other other tasks.

<|eot_id|><|start_header_id|><|start_header_id|>

other tasks.

<|eot_id|><|eot_id|><|eot_id|> other other other tasks<|eot_id|><|start_header_id|><|eot_id|><|eot_id|> other other other other other other other other other other other tasks<|eot_id|><|start_header_id|><|eot_id|> other
other tasks<|eot_id|><|eot_id|> other other other other other other other other other other other other other other other other other other other other other other other other, (<|start_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|>

Prompt: 0.207 tokens-per-sec
Generation: 5.560 tokens-per-sec

Process finished with exit code 0


today (2024/5/5) Updated! it works great!

update files:
special_tokens_map.json
tokenizer_config.json

upgrade mlx:
pip install --upgrade mlx
pip install --upgrade mlx-lm

Name: mlx Version: 0.12.2
Name: mlx-lm Version: 0.12.1

now, it works great!

code:

from mlx_lm import load, generate

model, tokenizer = load(model_path)
prompt = "hello"
prompt_template =f'''<|begin_of_text|><|start_header_id|>system<|end_header_id|>
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. <|eot_id|><|start_header_id|>user<|end_header_id|>
{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
'''

response = generate(model, tokenizer, prompt=prompt_template, verbose=True) #, max_tokens=512)  # temp=0.0

output:
```

Prompt: <|begin_of_text|><|start_header_id|>system<|end_header_id|>
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. <|eot_id|><|start_header_id|>user<|end_header_id|>
hello<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hello! I'm happy to assist you with any questions or topics you'd like to discuss. Please feel free to ask me anything, and I'll do my best to provide you with helpful and accurate information. What's on your mind today?

Prompt: 8.162 tokens-per-sec
Generation: 6.289 tokens-per-sec

Process finished with exit code 0


MLX Community org

Confirmed here too!

MLX Community org

Awesome, I updated a long time ago :)

prince-canuma changed discussion status to closed

Sign up or log in to comment