qwp4w3hyb
/

Meta-Llama-3-8B-Instruct-iMat-GGUF

Text Generation

importance matrix

Inference Endpoints

Model card Files Files and versions Community

qwp4w3hyb commited on Apr 19

Commit

8c22cb0

•

1 Parent(s): ac9074f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ license_link: LICENSE
 ## Note about eos token
 It seems llama 3 uses different eos tokens depending if it is in instruct mode.
-The initial upload has some issues with this as it uses the "default" eos token of 128001, when in instruct mode llama uses 128009 as eos token which causes it to ramble on and on without stopping.
 I am currently uploading fixed quants with the eos token id manually set to 128009.
 This fixes the issue for me, but you have to make sure to use the correct chat template, I recommend using [this](https://github.com/ggerganov/llama.cpp/pull/6751) PR and then launching llama.cpp with `--chat-template llama3`.

 ## Note about eos token
 It seems llama 3 uses different eos tokens depending if it is in instruct mode.
+The initial upload has some issues with this as it uses the "default" eos token of 128001, but when in instruct mode llama only outputs 128009 as eos token which causes it to ramble on and on without stopping.
 I am currently uploading fixed quants with the eos token id manually set to 128009.
 This fixes the issue for me, but you have to make sure to use the correct chat template, I recommend using [this](https://github.com/ggerganov/llama.cpp/pull/6751) PR and then launching llama.cpp with `--chat-template llama3`.