FriendliAI
/

Llama-2-70b-chat-hf-fp8

@@ -49,7 +49,6 @@ This model is compatible with **[Friendli Container](https://friendli.ai/product
 - Before you begin, make sure you have signed up for [Friendli Suite](https://suite.friendli.ai/). **You can use Friendli Containers free of charge for four weeks.**
 - Prepare a Personal Access Token following [this guide](#preparing-personal-access-token).
 - Prepare a Friendli Container Secret following [this guide](#preparing-container-secret).
-- Install Hugging Face CLI with `pip install -U "huggingface_hub[cli]"`
 ### Preparing Personal Access Token
@@ -88,25 +87,16 @@ You should pass the container secret as an environment variable to run the conta
 Once you've prepared the image of Friendli Container, you can launch it to create a serving endpoint.
 ```sh
-export MODEL_DIR=$PWD/FriendliAI--Llama-2-70b-chat-hf-fp8
-export FRIENDLI_CONTAINER_SECRET="YOUR CONTAINER SECRET"
-export FRIENDLI_CONTAINER_IMAGE="registry.friendli.ai/trial"
-export GPU_ENUMERATION='"device=0,1"'
-huggingface-cli download FriendliAI/Llama-2-70b-chat-hf-fp8 \
-  --local-dir $MODEL_DIR \
-  --local-dir-use-symlinks False
 docker run \
-  --gpus $GPU_ENUMERATION --network=host --ipc=host \
-  -v $MODEL_DIR:/model \
-  -e FRIENDLI_CONTAINER_SECRET=$FRIENDLI_CONTAINER_SECRET \
-  $FRIENDLI_CONTAINER_IMAGE /bin/bash -c \
-  "/root/launcher \
-    --web-server-port 6000 \
-    --num-devices 2 \
-    --ckpt-path /model \
-    --ckpt-type hf_safetensors"
 ```
 ---
@@ -146,7 +136,7 @@ Meta developed and publicly released the Llama 2 family of large language models
 **License** A custom commercial license is available at: [https://ai.meta.com/resources/models-and-libraries/llama-downloads/](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
-**Research Paper** ["Llama-2: Open Foundation and Fine-tuned Chat Models"](arxiv.org/abs/2307.09288)
 ## Intended Use
 **Intended Use Cases** Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.

 - Before you begin, make sure you have signed up for [Friendli Suite](https://suite.friendli.ai/). **You can use Friendli Containers free of charge for four weeks.**
 - Prepare a Personal Access Token following [this guide](#preparing-personal-access-token).
 - Prepare a Friendli Container Secret following [this guide](#preparing-container-secret).
 ### Preparing Personal Access Token
 Once you've prepared the image of Friendli Container, you can launch it to create a serving endpoint.
 ```sh
 docker run \
+  --gpus '"device=0,1"' \
+  -p 8000:8000 \
+  -v ~/.cache/huggingface:/root/.cache/huggingface \
+  -e FRIENDLI_CONTAINER_SECRET="YOUR CONTAINER SECRET" \
+  -e HF_TOKEN="YOUR HUGGING FACE TOKEN" \
+  registry.friendli.ai/trial \
+    --web-server-port 8000 \
+    --hf-model-name meta-llama/Llama-2-70b-chat-hf-fp8 \
+    --num-devices 2  # Use tensor parallelism degree 2
 ```
 ---
 **License** A custom commercial license is available at: [https://ai.meta.com/resources/models-and-libraries/llama-downloads/](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
+**Research Paper** ["Llama-2: Open Foundation and Fine-tuned Chat Models"](https://arxiv.org/abs/2307.09288)
 ## Intended Use
 **Intended Use Cases** Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.