jartine commited on
Commit
77852d6
1 Parent(s): 25216ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -16
README.md CHANGED
@@ -37,9 +37,9 @@ software. Both of them are included in a single file, which can be
37
  downloaded and run as follows:
38
 
39
  ```
40
- wget https://huggingface.co/Mozilla/gemma-2-9b-it-llamafile/resolve/main/gemma-2-9b-it.Q6_K.llamafile
41
- chmod +x gemma-2-9b-it.Q6_K.llamafile
42
- ./gemma-2-9b-it.Q6_K.llamafile
43
  ```
44
 
45
  The default mode of operation for these llamafiles is our new command
@@ -63,13 +63,13 @@ To instruct Gemma to do role playing, you can customize the system
63
  prompt as follows:
64
 
65
  ```
66
- ./gemma-2-9b-it.Q6_K.llamafile --chat -p "you are mosaic's godzilla"
67
  ```
68
 
69
  To view the man page, run:
70
 
71
  ```
72
- ./gemma-2-9b-it.Q6_K.llamafile --help
73
  ```
74
 
75
  To send a request to the OpenAI API compatible llamafile server, try:
@@ -78,7 +78,7 @@ To send a request to the OpenAI API compatible llamafile server, try:
78
  curl http://localhost:8080/v1/chat/completions \
79
  -H "Content-Type: application/json" \
80
  -d '{
81
- "model": "gemma-9b-it",
82
  "messages": [{"role": "user", "content": "Say this is a test!"}],
83
  "temperature": 0.0
84
  }'
@@ -87,7 +87,7 @@ curl http://localhost:8080/v1/chat/completions \
87
  If you don't want the chatbot and you only want to run the server:
88
 
89
  ```
90
- ./gemma-2-9b-it.Q6_K.llamafile --server --nobrowser --host 0.0.0.0
91
  ```
92
 
93
  An advanced CLI mode is provided that's useful for shell scripting. You
@@ -95,7 +95,7 @@ can use it by passing the `--cli` flag. For additional help on how it
95
  may be used, pass the `--help` flag.
96
 
97
  ```
98
- ./gemma-2-9b-it.Q6_K.llamafile --cli -p 'four score and seven' --log-disable
99
  ```
100
 
101
  You then need to fill out the prompt / history template (see below).
@@ -126,7 +126,7 @@ instead downloading the official llamafile release binary from
126
  have the .exe file extension, and then saying:
127
 
128
  ```
129
- .\llamafile-0.8.15.exe -m gemma-2-9b-it.Q6_K.llamafile
130
  ```
131
 
132
  That will overcome the Windows 4GB file size limit, allowing you to
@@ -172,13 +172,19 @@ AMD64.
172
  ## About Quantization Formats
173
 
174
  This model works well with any quantization format. Q6\_K is the best
175
- choice overall here. We tested that, with [our 27b Gemma2
176
- llamafiles](https://huggingface.co/Mozilla/gemma-2-27b-it-llamafile),
177
- that the llamafile implementation of Gemma2 is able to to produce
178
- identical responses to the Gemma2 model that's hosted by Google on
179
- aistudio.google.com. Therefore we'd assume these 9b llamafiles are also
180
- faithful to Google's intentions. If you encounter any divergences, then
181
- try using the BF16 weights, which have the original fidelity.
 
 
 
 
 
 
182
 
183
  ## See Also
184
 
 
37
  downloaded and run as follows:
38
 
39
  ```
40
+ wget https://huggingface.co/Mozilla/gemma-2-27b-it-llamafile/resolve/main/gemma-2-27b-it.Q6_K.llamafile
41
+ chmod +x gemma-2-27b-it.Q6_K.llamafile
42
+ ./gemma-2-27b-it.Q6_K.llamafile
43
  ```
44
 
45
  The default mode of operation for these llamafiles is our new command
 
63
  prompt as follows:
64
 
65
  ```
66
+ ./gemma-2-27b-it.Q6_K.llamafile --chat -p "you are mosaic's godzilla"
67
  ```
68
 
69
  To view the man page, run:
70
 
71
  ```
72
+ ./gemma-2-27b-it.Q6_K.llamafile --help
73
  ```
74
 
75
  To send a request to the OpenAI API compatible llamafile server, try:
 
78
  curl http://localhost:8080/v1/chat/completions \
79
  -H "Content-Type: application/json" \
80
  -d '{
81
+ "model": "gemma-27b-it",
82
  "messages": [{"role": "user", "content": "Say this is a test!"}],
83
  "temperature": 0.0
84
  }'
 
87
  If you don't want the chatbot and you only want to run the server:
88
 
89
  ```
90
+ ./gemma-2-27b-it.Q6_K.llamafile --server --nobrowser --host 0.0.0.0
91
  ```
92
 
93
  An advanced CLI mode is provided that's useful for shell scripting. You
 
95
  may be used, pass the `--help` flag.
96
 
97
  ```
98
+ ./gemma-2-27b-it.Q6_K.llamafile --cli -p 'four score and seven' --log-disable
99
  ```
100
 
101
  You then need to fill out the prompt / history template (see below).
 
126
  have the .exe file extension, and then saying:
127
 
128
  ```
129
+ .\llamafile-0.8.15.exe -m gemma-2-27b-it.Q6_K.llamafile
130
  ```
131
 
132
  That will overcome the Windows 4GB file size limit, allowing you to
 
172
  ## About Quantization Formats
173
 
174
  This model works well with any quantization format. Q6\_K is the best
175
+ choice overall here.
176
+
177
+ ## Testing
178
+
179
+ We tested that the gemma2 27b q6\_k llamafile produces nearly identical
180
+ responses to the Gemma2 model hosted by Google on aistudio.google.com
181
+ when temperature is set to zero.
182
+
183
+ ![screenshot of llamafile producing same output as google's hosted gemma service](gemma-proof.png)
184
+
185
+ Therefore, it is our belief, that the llamafile software faithfully
186
+ implements the gemma model. If you should encounter any divergences,
187
+ then try using the BF16 weights, which have the original fidelity.
188
 
189
  ## See Also
190