renillhuang commited on
Commit
bf7de3c
β€’
1 Parent(s): 4ccf566

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -4
README.md CHANGED
@@ -46,7 +46,7 @@ tags:
46
  - [πŸ“– Model Introduction](#model-introduction)
47
  - [πŸ”— Model Download](#model-download)
48
  - [πŸ”– Model Benchmark](#model-benchmark)
49
- - [πŸ“Š Model Inference](#model-inference)
50
  - [πŸ“œ Declarations & License](#declarations-license)
51
  - [πŸ₯‡ Company Introduction](#company-introduction)
52
 
@@ -278,10 +278,37 @@ CUDA_VISIBLE_DEVICES=0 python demo/text_generation_base.py --model OrionStarAI/O
278
  CUDA_VISIBLE_DEVICES=0 python demo/text_generation.py --model OrionStarAI/Orion-14B-Chat --tokenizer OrionStarAI/Orion-14B-Chat --prompt hi
279
 
280
  ```
 
281
 
282
- ## 4.4 Example Output
 
283
 
284
- ### 4.4.1. Casual Chat
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
285
 
286
  `````
287
  User: Hello
@@ -303,7 +330,7 @@ User: Tell me a joke.
303
  Orion-14B: Sure, here's a classic one-liner: Why don't scientists trust atoms? Because they make up everything.
304
  `````
305
 
306
- ### 4.4.2. Japanese & Korean Chat
307
 
308
  `````
309
  User:θ‡ͺ己を紹介してください
 
46
  - [πŸ“– Model Introduction](#model-introduction)
47
  - [πŸ”— Model Download](#model-download)
48
  - [πŸ”– Model Benchmark](#model-benchmark)
49
+ - [πŸ“Š Model Inference](#model-inference)[<img src="./assets/imgs/vllm.png" alt="vllm" height="20"/>](#vllm) [<img src="./assets/imgs/llama_cpp.png" alt="llamacpp" height="20"/>](#llama-cpp)
50
  - [πŸ“œ Declarations & License](#declarations-license)
51
  - [πŸ₯‡ Company Introduction](#company-introduction)
52
 
 
278
  CUDA_VISIBLE_DEVICES=0 python demo/text_generation.py --model OrionStarAI/Orion-14B-Chat --tokenizer OrionStarAI/Orion-14B-Chat --prompt hi
279
 
280
  ```
281
+ ## 4.4. Inference by vllm
282
 
283
+ - Project URL<br>
284
+ https://github.com/vllm-project/vllm
285
 
286
+ - Pull Request<br>
287
+ https://github.com/vllm-project/vllm/pull/2539
288
+
289
+ <a name="llama-cpp"></a><br>
290
+ ## 4.5. Inference by llama.cpp
291
+
292
+ - Project URL<br>
293
+ https://github.com/ggerganov/llama.cpp
294
+
295
+ - Pull Request<br>
296
+ https://github.com/ggerganov/llama.cpp/pull/5118
297
+
298
+ - How to convert to GGUF model
299
+
300
+ ```shell
301
+ python convert-hf-to-gguf.py path/to/Orion-14B-Chat --outfile chat.gguf
302
+ ```
303
+
304
+ - How to run generation
305
+
306
+ ```shell
307
+ ./main --frequency-penalty 0.5 --frequency-penalty 0.5 --top-k 5 --top-p 0.9 -m chat.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e
308
+ ```
309
+ ## 4.6 Example Output
310
+
311
+ ### 4.6.1. Casual Chat
312
 
313
  `````
314
  User: Hello
 
330
  Orion-14B: Sure, here's a classic one-liner: Why don't scientists trust atoms? Because they make up everything.
331
  `````
332
 
333
+ ### 4.6.2. Japanese & Korean Chat
334
 
335
  `````
336
  User:θ‡ͺ己を紹介してください