yejingfu
/

Meta-Llama-3.1-8B-Instruct-FP8-128K

Text Generation

Model card Files Files and versions Community

yejingfu commited on Aug 27

Commit

99f57f0

•

1 Parent(s): 15a81b6

Update README.md

Files changed (1) hide show

README.md +49 -49

README.md CHANGED Viewed

@@ -1,49 +1,49 @@
----
-language:
-- en
-- de
-- fr
-- it
-- pt
-- hi
-- es
-- th
-license: llama3.1
-pipeline_tag: text-generation
-tags:
-- facebook
-- meta
-- pytorch
-- llama
-- llama-3
----
-# Meta-Llama3.1-8B-FP8-128K
-## Model Overview
-- Model Architecture: Meta-Llama-3.1
-  - Input: Text
-  - Output: Text
-- Model Optimizations:
-  - Weight quantization: FP8
-  - Activation quantization: FP8
-  - KV Cache quantization:FP8
-- Intended Use Cases: Intended for commercial and research use in multiple languages. Similarly to Meta-Llama-3.1-8B-Instruct, this models is intended for assistant-like chat.
-- Out-of-scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
-- Release Date: 8/27/2024
-- Version: 1.0
-- License(s): llama3.1
-- Quantized version of Meta-Llama-3.1-8B-Instruct.
-## Serve with vLLM engine
-```bash
-python3 -m vllm.entrypoints.openai.api_server \
-    --port <port> --model yejingfu/Meta-Llama-3.1-8B-FP8-128K \
-    --tensor-parallel-size 1 --swap-space 16 --gpu-memory-utilization 0.96 --dtype auto \
-    --max-num-seqs 32 --max-model-len 131072 --kv-cache-dtype fp8 --enable-chunked-prefill
-```
----
-license: llama3.1
----

+---
+language:
+- en
+- de
+- fr
+- it
+- pt
+- hi
+- es
+- th
+license: llama3.1
+pipeline_tag: text-generation
+tags:
+- facebook
+- meta
+- pytorch
+- llama
+- llama-3
+---
+# Meta-Llama3.1-8B-FP8-128K
+## Model Overview
+- Model Architecture: Meta-Llama-3.1
+  - Input: Text
+  - Output: Text
+- Model Optimizations:
+  - Weight quantization: FP8
+  - Activation quantization: FP8
+  - KV Cache quantization:FP8
+- Intended Use Cases: Intended for commercial and research use in multiple languages. Similarly to Meta-Llama-3.1-8B-Instruct, this models is intended for assistant-like chat.
+- Out-of-scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
+- Release Date: 8/27/2024
+- Version: 1.0
+- License(s): llama3.1
+- Quantized version of Meta-Llama-3.1-8B-Instruct.
+## Serve with vLLM engine
+```bash
+python3 -m vllm.entrypoints.openai.api_server \
+    --port <port> --model yejingfu/Meta-Llama-3.1-8B-Instruct-FP8-128K \
+    --tensor-parallel-size 1 --swap-space 16 --gpu-memory-utilization 0.96 \
+    --max-num-seqs 32 --max-model-len 131072 --kv-cache-dtype fp8 --enable-chunked-prefill
+```
+## license: llama3.1