Update README.md
Browse files
README.md
CHANGED
@@ -53,6 +53,15 @@ datasets:
|
|
53 |
- Built with Meta Llama 3
|
54 |
- Quantized by [Astronomer](https://astronomer.io)
|
55 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
<!-- description start -->
|
57 |
## Description
|
58 |
|
|
|
53 |
- Built with Meta Llama 3
|
54 |
- Quantized by [Astronomer](https://astronomer.io)
|
55 |
|
56 |
+
# Important Note About Serving with vLLM & oobabooga/text-generation-webui
|
57 |
+
- For loading this model onto vLLM, make sure all requests have `"stop_token_ids":[128001, 128009]` to temporarily address the non-stop generation issue.
|
58 |
+
- vLLM does not yet respect `generation_config.json`.
|
59 |
+
- vLLM team is working on a a fix for this https://github.com/vllm-project/vllm/issues/4180
|
60 |
+
- For oobabooga/text-generation-webui
|
61 |
+
- Load the model via AutoGPTQ, with `no_inject_fused_attention` enabled. This is a bug with AutoGPTQ library.
|
62 |
+
- Under `Parameters` -> `Generation` -> `Skip special tokens`: turn this off (deselect)
|
63 |
+
- Under `Parameters` -> `Generation` -> `Custom stopping strings`: add `"<|end_of_text|>","<|eot_id|>"` to the field
|
64 |
+
|
65 |
<!-- description start -->
|
66 |
## Description
|
67 |
|