yejingfu commited on
Commit
99f57f0
1 Parent(s): 15a81b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -49
README.md CHANGED
@@ -1,49 +1,49 @@
1
- ---
2
- language:
3
- - en
4
- - de
5
- - fr
6
- - it
7
- - pt
8
- - hi
9
- - es
10
- - th
11
- license: llama3.1
12
- pipeline_tag: text-generation
13
- tags:
14
- - facebook
15
- - meta
16
- - pytorch
17
- - llama
18
- - llama-3
19
- ---
20
-
21
- # Meta-Llama3.1-8B-FP8-128K
22
-
23
- ## Model Overview
24
- - Model Architecture: Meta-Llama-3.1
25
- - Input: Text
26
- - Output: Text
27
- - Model Optimizations:
28
- - Weight quantization: FP8
29
- - Activation quantization: FP8
30
- - KV Cache quantization:FP8
31
- - Intended Use Cases: Intended for commercial and research use in multiple languages. Similarly to Meta-Llama-3.1-8B-Instruct, this models is intended for assistant-like chat.
32
- - Out-of-scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
33
- - Release Date: 8/27/2024
34
- - Version: 1.0
35
- - License(s): llama3.1
36
- - Quantized version of Meta-Llama-3.1-8B-Instruct.
37
-
38
-
39
- ## Serve with vLLM engine
40
- ```bash
41
- python3 -m vllm.entrypoints.openai.api_server \
42
- --port <port> --model yejingfu/Meta-Llama-3.1-8B-FP8-128K \
43
- --tensor-parallel-size 1 --swap-space 16 --gpu-memory-utilization 0.96 --dtype auto \
44
- --max-num-seqs 32 --max-model-len 131072 --kv-cache-dtype fp8 --enable-chunked-prefill
45
- ```
46
-
47
- ---
48
- license: llama3.1
49
- ---
 
1
+ ---
2
+ language:
3
+ - en
4
+ - de
5
+ - fr
6
+ - it
7
+ - pt
8
+ - hi
9
+ - es
10
+ - th
11
+ license: llama3.1
12
+ pipeline_tag: text-generation
13
+ tags:
14
+ - facebook
15
+ - meta
16
+ - pytorch
17
+ - llama
18
+ - llama-3
19
+ ---
20
+
21
+ # Meta-Llama3.1-8B-FP8-128K
22
+
23
+ ## Model Overview
24
+ - Model Architecture: Meta-Llama-3.1
25
+ - Input: Text
26
+ - Output: Text
27
+ - Model Optimizations:
28
+ - Weight quantization: FP8
29
+ - Activation quantization: FP8
30
+ - KV Cache quantization:FP8
31
+ - Intended Use Cases: Intended for commercial and research use in multiple languages. Similarly to Meta-Llama-3.1-8B-Instruct, this models is intended for assistant-like chat.
32
+ - Out-of-scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
33
+ - Release Date: 8/27/2024
34
+ - Version: 1.0
35
+ - License(s): llama3.1
36
+ - Quantized version of Meta-Llama-3.1-8B-Instruct.
37
+
38
+
39
+ ## Serve with vLLM engine
40
+ ```bash
41
+ python3 -m vllm.entrypoints.openai.api_server \
42
+ --port <port> --model yejingfu/Meta-Llama-3.1-8B-Instruct-FP8-128K \
43
+ --tensor-parallel-size 1 --swap-space 16 --gpu-memory-utilization 0.96 \
44
+ --max-num-seqs 32 --max-model-len 131072 --kv-cache-dtype fp8 --enable-chunked-prefill
45
+ ```
46
+
47
+
48
+ ## license: llama3.1
49
+