jondurbin commited on
Commit
f30cd62
1 Parent(s): 49f8a70

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -0
README.md CHANGED
@@ -44,6 +44,38 @@ An experimental fine-tune of yi-34b-200k using [bagel](https://github.com/jondur
44
 
45
  This is the model after the SFT phase, before DPO has been applied. DPO performs better on benchmarks, but this version is likely better for creative writing, roleplay, etc.
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  ### Data sources
48
 
49
  *Yes, you will see benchmark names in the list, but this only uses the train splits, and a decontamination by cosine similarity is performed at the end as a sanity check*
 
44
 
45
  This is the model after the SFT phase, before DPO has been applied. DPO performs better on benchmarks, but this version is likely better for creative writing, roleplay, etc.
46
 
47
+ ## How to easily download and use this model
48
+
49
+ [Massed Compute](https://massedcompute.com/?utm_source=huggingface&utm_creative_format=model_card&utm_content=creator_jon) has created a Virtual Machine (VM) pre-loaded with TGI and Text Generation WebUI.
50
+
51
+ 1) For this model rent the [Jon Durbin 2xA6000](https://shop.massedcompute.com/products/jon-durbin-2x-a6000?utm_source=huggingface&utm_creative_format=model_card&utm_content=creator_jon) Virtual Machine
52
+ 2) After you start your rental you will receive an email with instructions on how to Login to the VM
53
+ 3) Once inside the VM, open the terminal and run `conda activate text-generation-inference`
54
+ 4) Then `cd Desktop/text-generation-inference/`
55
+ 5) Run `volume=$PWD/data`
56
+ 6) Run`model=jondurbin/bagel-34b-v0.2`
57
+ 7) `sudo docker run --gpus '"device=0,1"' --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.3 --model-id $model`
58
+ 8) The model will take some time to load...
59
+ 9) Once loaded the model will be available on port 8080
60
+
61
+ Sample command within the VM
62
+ ```
63
+ curl 0.0.0.0:8080/generate \
64
+ -X POST \
65
+ -d '{"inputs":"[INST] <</SYS>>\nYou are a friendly chatbot.\n<</SYS>>\n\nWhat type of model are you? [/INST]","parameters":{"do_sample": true, "max_new_tokens": 100, "repetition_penalty": 1.15, "temperature": 0.7, "top_k": 20, "top_p": 0.9, "best_of": 1}}'\
66
+ -H 'Content-Type: application/json'
67
+ ```
68
+
69
+ You can also access the model from outside the VM
70
+ ```
71
+ curl IP_ADDRESS_PROVIDED_BY_MASSED_COMPUTE_VM:8080/generate \
72
+ -X POST \
73
+ -d '{"inputs":"[INST] <</SYS>>\nYou are a friendly chatbot.\n<</SYS>>\n\nWhat type of model are you? [/INST]","parameters":{"do_sample": true, "max_new_tokens": 100, "repetition_penalty": 1.15, "temperature": 0.7, "top_k": 20, "top_p": 0.9, "best_of": 1}}'\
74
+ -H 'Content-Type: application/json
75
+ ```
76
+
77
+ For assistance with the VM join the [Massed Compute Discord Server](https://discord.gg/Mj4YMQY3DA)
78
+
79
  ### Data sources
80
 
81
  *Yes, you will see benchmark names in the list, but this only uses the train splits, and a decontamination by cosine similarity is performed at the end as a sanity check*