jartine
/

gemma-2-9b-it-llamafile

Inference Endpoints

Model card Files Files and versions Community

jartine commited on Jul 2

Commit

a5c33a2

•

1 Parent(s): bdb7390

Update README.md

Files changed (1) hide show

README.md +10 -7

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ Gemma v2 is a large language model released by Google on Jun 27th 2024.
 - Original model: [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)
 The model is packaged into executable weights, which we call
-[llamafiles](https://github.com/Mozilla-Ocho/llamafile)). This makes it
 easy to use the model on Linux, MacOS, Windows, FreeBSD, OpenBSD, and
 NetBSD for AMD64 and ARM64.
@@ -75,11 +75,9 @@ of the README.
 When using the browser GUI, you need to fill out the following fields.
-Prompt template:
 ```
-<start_of_turn>system
-{{prompt}}<end_of_turn>
 {{history}}
 <start_of_turn>{{char}}
 ```
@@ -109,9 +107,14 @@ AMD64.
 ## About Quantization Formats
-This model works should work well with any quantization format. Q6\_K is
-the best choice overall here. But since this is a Google model, the
-Google Brain floating point format (BF16) provides maximum quality.
 ---

 - Original model: [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)
 The model is packaged into executable weights, which we call
+[llamafiles](https://github.com/Mozilla-Ocho/llamafile). This makes it
 easy to use the model on Linux, MacOS, Windows, FreeBSD, OpenBSD, and
 NetBSD for AMD64 and ARM64.
 When using the browser GUI, you need to fill out the following fields.
+Prompt template (note: this is for chat; Gemma doesn't have a system role):
 ```
 {{history}}
 <start_of_turn>{{char}}
 ```
 ## About Quantization Formats
+This model works well with any quantization format. Q6\_K is the best
+choice overall here. We tested that, with [our 27b Gemma2
+llamafiles](https://huggingface.co/jartine/gemma-2-27b-it-llamafile),
+that the llamafile implementation of Gemma2 is able to to produce
+identical responses to the Gemma2 model that's hosted by Google on
+aistudio.google.com. Therefore we'd assume these 9b llamafiles are also
+faithful to Google's intentions. If you encounter any divergences, then
+try using the BF16 weights, which have the original fidelity.
 ---