MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF

Apr 19, 2024

only me ?

Owner Apr 19, 2024

Which library/software are you using? If it's LM Studio, you need to update the latest. If it's Llama.cpp, you should follow the template correctly: https://huggingface.co/MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF/discussions/5

MaziyarPanahi

Owner Apr 19, 2024

I finished the download of 70B Q2, it stopped correctly. The very latest version of LM Studio they released last night:

ishanparihar

Apr 19, 2024

•

edited Apr 19, 2024

Your models don't stop working. I tested both 8b Q6_K, Q8 and 70b Q2k_M. NousResearch also fixed their tokenizer later on to fix this issue with the llama 3 architecture. Maybe something from llama.cpp commits may help.
Most quantized models by others authors are not stopping in their output except for the recently quantized ones. LM Studio's models work fine. Didn't tested NousResearch quants as they had only 4bit+ models.

MaziyarPanahi

Owner Apr 19, 2024

I cannot reproduce this in LM Studio unfortunately. Also, others don't have any problem with it: https://huggingface.co/MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF/discussions/5

The stop strings might help you

ishanparihar

Apr 19, 2024

Thanks for letting me know this.

I already am using the llama 3 template from LM Studio. The same as you have shared.
I will download the model again and let you know if the issue persists.

ishanparihar

Apr 19, 2024

I downloaded the model again to experience the same issue. I don't know why this is happening but the assistant at the end and the consequent non-stop chatter is making this model unusable. I hope that the open source community finds a fix for this novel architecture of llama 3.

P.S. Can't wait for its finetunes.

carlosbdw

Apr 20, 2024

yes, maybe LM Studio is ok , I am using llamafile

MaziyarPanahi

Owner Apr 20, 2024

I downloaded the model again to experience the same issue. I don't know why this is happening but the assistant at the end and the consequent non-stop chatter is making this model unusable. I hope that the open source community finds a fix for this novel architecture of llama 3.

P.S. Can't wait for its finetunes.

This reminds me of the older version of LM Studio with a bad Llama-3 prompt. Could you please show me your Stop Strings and LM Studio version? I would change the GGUF metadata to include that fix, it's just I cannot reproduce this on my side to be 100% sure I fixed anything.

ishanparihar

Apr 20, 2024

•

edited Apr 20, 2024

I downloaded the model again to experience the same issue. I don't know why this is happening but the assistant at the end and the consequent non-stop chatter is making this model unusable. I hope that the open source community finds a fix for this novel architecture of llama 3.

P.S. Can't wait for its finetunes.

This reminds me of the older version of LM Studio with a bad Llama-3 prompt. Could you please show me your Stop Strings and LM Studio version? I would change the GGUF metadata to include that fix, it's just I cannot reproduce this on my side to be 100% sure I fixed anything.

It is the latest version and my stop strings were the same as you have posted earlier. I am attaching a screenshot of the same. It is understandable to reconfigure a big model is a lot of work and being sure is important in that. I am also looking forward to making this work.

MaziyarPanahi

Owner Apr 20, 2024

In the About you can see the version. I just made this quick demo to demonstrate everything works properly once the appropriate prompt template is set:

https://colab.research.google.com/drive/1HD-_evvGo1l1B-imVfQP7BKfDe-7BbE-?usp=sharing

maxpayne07

Apr 20, 2024

gives me error:

{
  "cause": "(Exit code: 42). Unknown error. Try a different model and/or config.",
  "suggestion": "",
  "data": {
    "memory": {
      "ram_capacity": "47.80 GB",
      "ram_unused": "40.07 GB"
    },
    "gpu": {
      "type": "AmdOpenCL",
      "vram_recommended_capacity": "33.74 GB",
      "vram_unused": "33.74 GB"
    },
    "os": {
      "platform": "win32",
      "version": "10.0.22631",
      "supports_avx2": true
    },
    "app": {
      "version": "0.2.20",
      "downloadsDir": "E:\\AI"
    },
    "model": {}
  },
  "title": "Error loading model."
}```

maxpayne07

Apr 20, 2024

Reinstalled lmstudio, problem solved

tsalvoch

Apr 22, 2024

I also faced issues with this version that keeps generating text without stopping.
After some research, it seems that other versions have encountered the same problem and it has been resolved by the creator of GGUF by modifying the end token.
With the model from the link below, the text generation stops as expected.

Link: https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF

zerosystem

Apr 22, 2024

Hi all, adding the string "assistant" (without the quotation marks into Stop Strings in LM Studio works for me using the Llama 3 preset. It prevents the model from going on and on in my case.

tsalvoch

Apr 22, 2024

@zerosystem Yes but now you can't have the word "assistant" in your messages. It's a temporary solution but it's not sustainable. The best thing is to be able to use special tokens.

MaziyarPanahi

Owner Apr 22, 2024

Yes, please be careful adding tokens to the stop strings that are not meant to be used to stop the streaming.

Here you can see, using the latest Llama.cpp and any of the GGUF here you can have a stream that stops. https://colab.research.google.com/drive/1HD-_evvGo1l1B-imVfQP7BKfDe-7BbE-?usp=sharing

However, I can change the metadata for one or two models and see if it fixes for those who couldn't find away to fix it? Who can help me test this so I can safely do it for all?

MaziyarPanahi

Owner Apr 22, 2024

I went ahead and made the change and I am uploading them again. I realize that so many users workaround this in the application level, most don't even notice it. But there are some who just cannot go around it. I am re-uploading it for them. Hope it helps.

this works now perfectly:

../apps/fine-tuning/quantize/gguf/llama.cpp/main -m Meta-Llama-3-8B-Instruct.Q2_K.gguf -p "<|start_header_id|>user<|end_header_id|>\n\nBuilding a website can be done in 10 simple steps:\nStep 1:<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -n 1024


<|start_header_id|>user<|end_header_id|>\n\nBuilding a website can be done in 10 simple steps:\nStep 1:<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYou're talking about building a website! That's a great topic. Here's the rest of
 the text:

Building a website can be done in 10 simple steps:

Step 1: Define Your Purpose

* Determine the main purpose of your website
* Identify your target audience
* Decide what you want to achieve with your website

Step 2: Choose a Domain Name

* Research domain name options
* Check if the domain name is available
* Register your domain name

Step 3: Design Your Website

* Choose a website builder or CMS
* Create a wireframe of your website's structure
* Design your website's layout

Step 4: Develop Your Content

* Create high-quality content for your website
* Plan out your content in advance
* Create content calendar

Step 5: Create a Logo

* Design a logo that represents your brand
* Make sure your logo is simple and recognizable

Step 6: Build Your Website

* Choose a website builder or CMS
* Develop your website using the chosen platform
* Make sure your website is mobile-friendly

Step 7: Add Features and Functionality

* Add features and functionality to your website
* Make sure your website is user-friendly

Step 8: Test and Refine

* Test your website's functionality and features
* Make sure your website is bug-free
* Refine your website's design and features

Step 9: Launch Your Website

* Launch your website to the public
* Promote your website to your target audience
* Monitor your website's traffic and analytics

Step 10: Maintain and Update Your Website

* Keep your website up to date
* Make sure your website is secure and updated with the latest features
* Keep your website relevant to your target audience

That's it! Building a website can be a simple and straightforward process if you follow these steps.<|eot_id|> [end of text]

tsalvoch

Apr 22, 2024

@MaziyarPanahi Thank you for your work :-).

Dampfinchen

Apr 22, 2024

•

edited Apr 22, 2024

With the latest llama.cpp build, the team added official support for LLama 3 using convert.py, so no hacks should be needed anymore.

Dampfinchen

Apr 22, 2024

Yes, please be careful adding tokens to the stop strings that are not meant to be used to stop the streaming.

Here you can see, using the latest Llama.cpp and any of the GGUF here you can have a stream that stops. https://colab.research.google.com/drive/1HD-_evvGo1l1B-imVfQP7BKfDe-7BbE-?usp=sharing

However, I can change the metadata for one or two models and see if it fixes for those who couldn't find away to fix it? Who can help me test this so I can safely do it for all?

That configuration outlined in the google colab is, sadly, not correct.

EOS and BOS should be

llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128009 '<|eot_id|>'

instead of

llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'

I think simply by converting with the most recent convert.py this should be fixed.

With that configuration there are no issues stopping.

MaziyarPanahi

Owner Apr 23, 2024

@Dampfinchen Thanks. I have checked, all these new GGUFs are identical to converting with the latest Llama.cpp.

At this point, there shouldn't be any issue (some experienced) using them anywhere.

zerosystem

Apr 23, 2024

@zerosystem Yes but now you can't have the word "assistant" in your messages. It's a temporary solution but it's not sustainable. The best thing is to be able to use special tokens.

apologies! I am quite new to this stuff and I did not give it a think through!

zerosystem

Apr 23, 2024

I am still having problems with the Fp16 version not being able to stop on LM Studio, I am using the Llama 3 preset and I have not changed anything.

MaziyarPanahi

Owner Apr 23, 2024

Hi @zerosystem

If that's the only one that doesn't stop, I can fix it now and upload?

MaziyarPanahi

Owner Apr 23, 2024

I am still having problems with the Fp16 version not being able to stop on LM Studio, I am using the Llama 3 preset and I have not changed anything.

I am uploading the new Fp16. It was missed in the last night upload :) In 20 minutes you should be able to re-download it and use it.

zerosystem

Apr 23, 2024

Hi @MaziyarPanahi yes please thank you! I have tried your q8_0 gguf and it works.

zerosystem

Apr 23, 2024

@MaziyarPanahi Thanks so much, that is good to hear!

zerosystem

Apr 23, 2024

@MaziyarPanahi Its working now :)

MaziyarPanahi

Owner Apr 23, 2024

Fantastic! many thanks for confirming :)

MaziyarPanahi changed discussion status to closed Apr 23, 2024

MLAbhishekTI

Apr 28, 2024

I have deploy my model in aws sagemaker endpoint facing this issue, can you please help me out here ? I'm talking about not stopping issue.
Thanks

MaziyarPanahi

Owner Apr 28, 2024

Please make sure:

everything you are using is up to date specially Llama.cpp
follow the template exactly

Example:

<|start_header_id|>user<|end_header_id|>\n\nBuilding a website can be done in 10 simple steps:\nStep 1:<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n

MaziyarPanahi
/

Meta-Llama-3-8B-Instruct-GGUF

this model can not stop