Text Generation
Transformers
Safetensors
mixtral
conversational
Inference Endpoints
text-generation-inference
nic-mc commited on
Commit
5febd91
1 Parent(s): 614649c

Include Massed Compute VM with Steps

Browse files

* Include link to the correct VM (4xA6000) for this model card
* Repeat instructions that are found within the VM
* Provide exact docker commands for this model card
* Example curl request for inside the VM and outside
* Link to Massed Compute Discord if people have issues.

Files changed (1) hide show
  1. README.md +31 -0
README.md CHANGED
@@ -101,6 +101,37 @@ Hardware kindly provided by [Massed Compute](https://massedcompute.com/?utm_sour
101
 
102
  Only the train splits were used (if a split was provided), and an additional pass of decontamination is performed using approximate nearest neighbor search (via faiss).
103
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  ## Prompt formatting
105
 
106
  In sticking with the theme of the bagel, I didn't want to use a single prompt format, so I used 4 - vicuna, llama-2, alpaca, and chat-ml (sorta).
 
101
 
102
  Only the train splits were used (if a split was provided), and an additional pass of decontamination is performed using approximate nearest neighbor search (via faiss).
103
 
104
+ ## How to easily download and use this model
105
+ [Massed Compute](https://massedcompute.com/?utm_source=huggingface&utm_creative_format=model_card&utm_content=creator_jon) has created a Virtual Machine (VM) pre-loaded with TGI and Text Generation WebUI.
106
+
107
+ 1) For this model rent the [Jon Durbin 4xA6000](https://shop.massedcompute.com/products/jon-durbin-4x-a6000?utm_source=huggingface&utm_creative_format=model_card&utm_content=creator_jon) Virtual Machine
108
+ 2) After you start your rental you will receive an email with instructions on how to Login to the VM
109
+ 3) Once inside the VM, open the terminal and run `conda activate text-generation-inference`
110
+ 4) Then `cd Desktop/text-generation-inference/`
111
+ 5) Run `volume=$PWD/data`
112
+ 6) Run`model=jondurbin/bagel-8x7b-v0.2`
113
+ 7) `sudo docker run --gpus '"device=0,1,2,3"' --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.3 --model-id $model`
114
+ 8) The model will take some time to load...
115
+ 9) Once loaded the model will be available on port 8080
116
+
117
+ Sample command within the VM
118
+ ```
119
+ curl 0.0.0.0:8080/generate \
120
+ -X POST \
121
+ -d '{"inputs":"<|system|>You are a friendly chatbot.\n<|user|>What type of model are you?\n<|assistant|>","parameters":{"do_sample": true, "max_new_tokens": 100, "repetition_penalty": 1.15, "temperature": 0.7, "top_k": 20, "top_p": 0.9, "best_of": 1}}'\
122
+ -H 'Content-Type: application/json'
123
+ ```
124
+
125
+ You can also access the model from outside the VM
126
+ ```
127
+ curl IP_ADDRESS_PROVIDED_BY_MASSED_COMPUTE_VM:8080/generate \
128
+ -X POST \
129
+ -d '{"inputs":"<|system|>You are a friendly chatbot.\n<|user|>What type of model are you?\n<|assistant|>","parameters":{"do_sample": true, "max_new_tokens": 100, "repetition_penalty": 1.15, "temperature": 0.7, "top_k": 20, "top_p": 0.9, "best_of": 1}}'\
130
+ -H 'Content-Type: application/json
131
+ ```
132
+
133
+ For assistance with the VM join the [Massed Compute Discord Server](https://discord.gg/Mj4YMQY3DA)
134
+
135
  ## Prompt formatting
136
 
137
  In sticking with the theme of the bagel, I didn't want to use a single prompt format, so I used 4 - vicuna, llama-2, alpaca, and chat-ml (sorta).