jayr014 commited on
Commit
2076db5
1 Parent(s): befc61c

updating readmoe

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -96,6 +96,8 @@ NOTE: Things that we had to modify in order for BLOOMChat to work:
96
  - Change the model name from `bigscience/bloom` to `sambanovasystems/BLOOMChat-176B-v1`
97
  - Modifying `inference_server/models/hf_accelerate.py`
98
  - This is because for our testing of this repo we used 4 80GB A100 GPUs and would run into memory issues
 
 
99
 
100
  Modifications for `inference_server/models/hf_accelerate.py`:
101
 
@@ -112,6 +114,18 @@ class HFAccelerateModel(Model):
112
  kwargs["max_memory"] = reduce_max_memory_dict
113
  ```
114
 
 
 
 
 
 
 
 
 
 
 
 
 
115
  Running command for int8 (sub optimal performance, but fast inference time):
116
  ```
117
  python -m inference_server.cli --model_name sambanovasystems/BLOOMChat-176B-v1 --model_class AutoModelForCausalLM --dtype int8 --deployment_framework hf_accelerate --generate_kwargs '{"do_sample": false, "temperature": 0.8, "repetition_penalty": 1.2, "top_p": 0.9, "max_new_tokens": 512}'
 
96
  - Change the model name from `bigscience/bloom` to `sambanovasystems/BLOOMChat-176B-v1`
97
  - Modifying `inference_server/models/hf_accelerate.py`
98
  - This is because for our testing of this repo we used 4 80GB A100 GPUs and would run into memory issues
99
+ - Modifying `inference_server/cli.py`
100
+ - This is because the model was trained using specific human, bot tags
101
 
102
  Modifications for `inference_server/models/hf_accelerate.py`:
103
 
 
114
  kwargs["max_memory"] = reduce_max_memory_dict
115
  ```
116
 
117
+ Modifications for `inference_server/cli.py`:
118
+
119
+ ```python
120
+ def main() -> None:
121
+ ...
122
+ while True:
123
+ input_text = input("Input text: ")
124
+
125
+ input_text = input_text.strip()
126
+ modified_input_text = f"<human>: {input_text}\n<bot>:"
127
+ ```
128
+
129
  Running command for int8 (sub optimal performance, but fast inference time):
130
  ```
131
  python -m inference_server.cli --model_name sambanovasystems/BLOOMChat-176B-v1 --model_class AutoModelForCausalLM --dtype int8 --deployment_framework hf_accelerate --generate_kwargs '{"do_sample": false, "temperature": 0.8, "repetition_penalty": 1.2, "top_p": 0.9, "max_new_tokens": 512}'