JRosenkranz commited on
Commit
924f16b
1 Parent(s): 0699e13

updated readme with samples

Browse files
Files changed (1) hide show
  1. README.md +70 -1
README.md CHANGED
@@ -1,3 +1,72 @@
1
  ---
2
  license: llama2
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: llama2
3
+ ---
4
+
5
+ To try this out running in a production-like environment, please use the pre-built docker image:
6
+
7
+ ```bash
8
+ docker pull docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-local/tgis-os:spec.7
docker run -d --rm --gpus all \
9
+ --name my-tgis-server \
10
+ -v /path/to/all/models:/models \
11
+ -e MODEL_NAME=/models/model_weights/llama/13B-F \
12
+ -e SPECULATOR_PATH=/models/speculator_weights/llama/13B-F \
13
+ -e FLASH_ATTENTION=true \
14
+ -e PAGED_ATTENTION=true \
15
+ -e DTYPE_STR=float16 \
16
+ docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-local/tgis-os:spec.7
17
+
18
+ docker logs my-tgis-server -f
19
+ docker exec -it my-tgis-server python /path-to-example-code/sample_client.py
20
+ ```
21
+
22
+ To try this out with the fms-native compiled model, please execute the following:
23
+
24
+ #### batch_size=1 (compile + cudagraphs)
25
+
26
+ ```bash
27
+ git clone https://github.com/foundation-model-stack/fms-extras
28
+ (cd fms-extras && pip install -e .)
29
+ pip install transformers==4.35.0 sentencepiece numpy
30
+ python fms-extras/scripts/paged_speculative_inference.py \
31
+ --variant=13b \
32
+ --model_path=/path/to/model_weights/llama/13B-F \
33
+ --model_source=hf \
34
+ --tokenizer=/path/to/llama/13B-F \
35
+ --speculator_path=/path/to/speculator_weights/llama/13B-F \
36
+ --speculator_source=hf \
37
+ --compile \
38
+ --compile_mode=reduce-overhead
39
+ ```
40
+
41
+ #### batch_size=1 (compile)
42
+
43
+ ```bash
44
+ git clone https://github.com/foundation-model-stack/fms-extras
45
+ (cd fms-extras && pip install -e .)
46
+ pip install transformers==4.35.0 sentencepiece numpy
47
+ python fms-extras/scripts/paged_speculative_inference.py \
48
+ --variant=13b \
49
+ --model_path=/path/to/model_weights/llama/13B-F \
50
+ --model_source=hf \
51
+ --tokenizer=/path/to/llama/13B-F \
52
+ --speculator_path=/path/to/speculator_weights/llama/13B-F \
53
+ --speculator_source=hf \
54
+ --compile \
55
+ ```
56
+
57
+ #### batch_size=4 (compile)
58
+
59
+ ```bash
60
+ git clone https://github.com/foundation-model-stack/fms-extras
61
+ (cd fms-extras && pip install -e .)
62
+ pip install transformers==4.35.0 sentencepiece numpy
63
+ python fms-extras/scripts/paged_speculative_inference.py \
64
+ --variant=13b \
65
+ --model_path=/path/to/model_weights/llama/13B-F \
66
+ --model_source=hf \
67
+ --tokenizer=/path/to/llama/13B-F \
68
+ --speculator_path=/path/to/speculator_weights/llama/13B-F \
69
+ --speculator_source=hf \
70
+ --batch_input \
71
+ --compile \
72
+ ```