jartine commited on
Commit
a5c33a2
1 Parent(s): bdb7390

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -7
README.md CHANGED
@@ -24,7 +24,7 @@ Gemma v2 is a large language model released by Google on Jun 27th 2024.
24
  - Original model: [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)
25
 
26
  The model is packaged into executable weights, which we call
27
- [llamafiles](https://github.com/Mozilla-Ocho/llamafile)). This makes it
28
  easy to use the model on Linux, MacOS, Windows, FreeBSD, OpenBSD, and
29
  NetBSD for AMD64 and ARM64.
30
 
@@ -75,11 +75,9 @@ of the README.
75
 
76
  When using the browser GUI, you need to fill out the following fields.
77
 
78
- Prompt template:
79
 
80
  ```
81
- <start_of_turn>system
82
- {{prompt}}<end_of_turn>
83
  {{history}}
84
  <start_of_turn>{{char}}
85
  ```
@@ -109,9 +107,14 @@ AMD64.
109
 
110
  ## About Quantization Formats
111
 
112
- This model works should work well with any quantization format. Q6\_K is
113
- the best choice overall here. But since this is a Google model, the
114
- Google Brain floating point format (BF16) provides maximum quality.
 
 
 
 
 
115
 
116
  ---
117
 
 
24
  - Original model: [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)
25
 
26
  The model is packaged into executable weights, which we call
27
+ [llamafiles](https://github.com/Mozilla-Ocho/llamafile). This makes it
28
  easy to use the model on Linux, MacOS, Windows, FreeBSD, OpenBSD, and
29
  NetBSD for AMD64 and ARM64.
30
 
 
75
 
76
  When using the browser GUI, you need to fill out the following fields.
77
 
78
+ Prompt template (note: this is for chat; Gemma doesn't have a system role):
79
 
80
  ```
 
 
81
  {{history}}
82
  <start_of_turn>{{char}}
83
  ```
 
107
 
108
  ## About Quantization Formats
109
 
110
+ This model works well with any quantization format. Q6\_K is the best
111
+ choice overall here. We tested that, with [our 27b Gemma2
112
+ llamafiles](https://huggingface.co/jartine/gemma-2-27b-it-llamafile),
113
+ that the llamafile implementation of Gemma2 is able to to produce
114
+ identical responses to the Gemma2 model that's hosted by Google on
115
+ aistudio.google.com. Therefore we'd assume these 9b llamafiles are also
116
+ faithful to Google's intentions. If you encounter any divergences, then
117
+ try using the BF16 weights, which have the original fidelity.
118
 
119
  ---
120