Qwen
/

Qwen2-72B-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hzhwcmhf commited on May 31, 2024

Commit

7b5d46f

·

verified ·

1 Parent(s): 54e5483

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -74,7 +74,7 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 To handle extensive inputs exceeding 32,768 tokens, we utilize [YARN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
-For deployment, we recommend using vLLM. You can enable long-context capabilities, follow these steps:
 1. **Install vLLM**: Ensure you have the latest version from the main branch of [vLLM](https://github.com/vllm-project/vllm).

 To handle extensive inputs exceeding 32,768 tokens, we utilize [YARN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
+For deployment, we recommend using vLLM. You can enable the long-context capabilities by following these steps:
 1. **Install vLLM**: Ensure you have the latest version from the main branch of [vLLM](https://github.com/vllm-project/vllm).