qwp4w3hyb commited on
Commit
ac9074f
1 Parent(s): eb9f626

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -17,8 +17,21 @@ license_name: llama3
17
  license_link: LICENSE
18
  ---
19
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  # Quant Infos
21
 
 
22
  Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [0d56246f4b9764158525d894b96606f6163c53a8](https://github.com/ggerganov/llama.cpp/commit/0d56246f4b9764158525d894b96606f6163c53a8) (master from 2024-04-18)
23
  with tokenizer fixes from [this](https://github.com/ggerganov/llama.cpp/pull/6745) branch cherry-picked
24
  Imatrix dataset was used from [here](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
 
17
  license_link: LICENSE
18
  ---
19
 
20
+ ## Note about eos token
21
+ It seems llama 3 uses different eos tokens depending if it is in instruct mode.
22
+ The initial upload has some issues with this as it uses the "default" eos token of 128001, when in instruct mode llama uses 128009 as eos token which causes it to ramble on and on without stopping.
23
+
24
+ I am currently uploading fixed quants with the eos token id manually set to 128009.
25
+ This fixes the issue for me, but you have to make sure to use the correct chat template, I recommend using [this](https://github.com/ggerganov/llama.cpp/pull/6751) PR and then launching llama.cpp with `--chat-template llama3`.
26
+
27
+ If you do not want to redownload you can fix your local gguf file with this command:
28
+ ```
29
+ python3 ../../tools/llama.cpp/gguf-py/scripts/gguf-set-metadata.py $file tokenizer.ggml.eos_token_id 128009 --force
30
+ ```
31
+
32
  # Quant Infos
33
 
34
+
35
  Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [0d56246f4b9764158525d894b96606f6163c53a8](https://github.com/ggerganov/llama.cpp/commit/0d56246f4b9764158525d894b96606f6163c53a8) (master from 2024-04-18)
36
  with tokenizer fixes from [this](https://github.com/ggerganov/llama.cpp/pull/6745) branch cherry-picked
37
  Imatrix dataset was used from [here](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)