jartine commited on
Commit
330a572
·
verified ·
1 Parent(s): 37d86bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -43
README.md CHANGED
@@ -28,23 +28,13 @@ The model is packaged into executable weights, which we call
28
  easy to use the model on Linux, MacOS, Windows, FreeBSD, OpenBSD, and
29
  NetBSD for AMD64 and ARM64.
30
 
31
- ## License
32
-
33
- The llamafile software is open source and permissively licensed. However
34
- the weights embedded inside the llamafiles are governed by Google's
35
- Gemma License and Gemma Prohibited Use Policy. This is not an open
36
- source license. It's about as restrictive as it gets. There's a great
37
- many things you're not allowed to do with Gemma. The terms of the
38
- license and its list of unacceptable uses can be changed by Google at
39
- any time. Therefore we wouldn't recommend using these llamafiles for
40
- anything other than evaluating the quality of Google's engineering.
41
-
42
- See the [LICENSE](LICENSE) file for further details.
43
 
44
  ## Quickstart
45
 
46
- Running the following on a desktop OS will launch a tab in your web
47
- browser with a chatbot interface.
 
48
 
49
  ```
50
  wget https://huggingface.co/jartine/gemma-2-2b-it-llamafile/resolve/main/gemma-2-2b-it.Q6_K.llamafile
@@ -52,56 +42,95 @@ chmod +x gemma-2-2b-it.Q6_K.llamafile
52
  ./gemma-2-2b-it.Q6_K.llamafile
53
  ```
54
 
55
- You then need to fill out the prompt / history template (see below).
 
56
 
57
- This model has a max context window size of 8k tokens. By default, a
58
- context window size of 512 tokens is used. You may increase this to the
59
- maximum by passing the `-c 0` flag.
60
-
61
- On GPUs with sufficient RAM, the `-ngl 999` flag may be passed to use
62
- the system's NVIDIA or AMD GPU(s). On Windows, only the graphics card
63
- driver needs to be installed. If the prebuilt DSOs should fail, the CUDA
64
- or ROCm SDKs may need to be installed, in which case llamafile builds a
65
- native module just for your system.
66
-
67
- For further information, please see the [llamafile
68
- README](https://github.com/mozilla-ocho/llamafile/).
69
 
70
  Having **trouble?** See the ["Gotchas"
71
- section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas)
72
  of the README.
73
 
74
- ## Prompting
75
 
76
- When using the browser GUI, you need to fill out the following fields.
 
 
 
 
77
 
78
- Prompt template (note: this is for chat; Gemma doesn't have a system role):
 
79
 
80
  ```
81
- {{history}}
82
- <start_of_turn>{{char}}
83
  ```
84
 
85
- History template:
86
 
87
  ```
88
- <start_of_turn>{{name}}
89
- {{message}}<end_of_turn>
90
  ```
91
 
92
- Here's an example of how to prompt Gemma v2 on the command line:
93
 
94
  ```
95
- ./gemma-2-2b-it.Q6_K.llamafile --special -p '<start_of_turn>user
96
- The Belobog Academy has discovered a new, invasive species of algae that can double itself in one day, and in 30 days fills a whole reservoir - contaminating the water supply. How many days would it take for the algae to fill half of the reservoir?<end_of_turn>
97
- <start_of_turn>model
98
- '
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  ```
100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  ## About llamafile
102
 
103
- llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023.
104
- It uses Cosmopolitan Libc to turn LLM weights into runnable llama.cpp
105
  binaries that run on the stock installs of six OSes for both ARM64 and
106
  AMD64.
107
 
@@ -127,6 +156,19 @@ which require more memory.
127
  The 9B and 27B models were released a month earlier than 2B, so they're
128
  packaged with an slightly older version of the llamafile software.
129
 
 
 
 
 
 
 
 
 
 
 
 
 
 
130
  ---
131
 
132
  # Gemma 2 model card
 
28
  easy to use the model on Linux, MacOS, Windows, FreeBSD, OpenBSD, and
29
  NetBSD for AMD64 and ARM64.
30
 
31
+ *Software Last Updated: 2024-10-30*
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  ## Quickstart
34
 
35
+ To get started, you need both the Gemma weights, and the llamafile
36
+ software. Both of them are included in a single file, which can be
37
+ downloaded and run as follows:
38
 
39
  ```
40
  wget https://huggingface.co/jartine/gemma-2-2b-it-llamafile/resolve/main/gemma-2-2b-it.Q6_K.llamafile
 
42
  ./gemma-2-2b-it.Q6_K.llamafile
43
  ```
44
 
45
+ The default mode of operation for these llamafiles is our new command
46
+ line chatbot interface. It looks like this:
47
 
48
+ ![Screenshot of Gemma 2b llamafile on MacOS](llamafile-gemma.png)
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  Having **trouble?** See the ["Gotchas"
51
+ section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas-and-troubleshooting)
52
  of the README.
53
 
54
+ ## Usage
55
 
56
+ By default, llamafile launches a chatbot in the terminal, and a server
57
+ in the background. The chatbot is mostly self-explanatory. You can type
58
+ `/help` for further details. See the [llamafile v0.8.15 release
59
+ notes](https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.8.15)
60
+ for documentation on our newest chatbot features.
61
 
62
+ To instruct Gemma to do role playing, you can customize the system
63
+ prompt of the chatbot as follows:
64
 
65
  ```
66
+ ./gemma-2-2b-it.Q6_K.llamafile --chat -p "you are the ghost of edgar allen poe"
 
67
  ```
68
 
69
+ To view the man page, run:
70
 
71
  ```
72
+ ./gemma-2-2b-it.Q6_K.llamafile --help
 
73
  ```
74
 
75
+ To send a request to the OpenAI API compatible llamafile server, try:
76
 
77
  ```
78
+ curl http://localhost:8080/v1/chat/completions \
79
+ -H "Content-Type: application/json" \
80
+ -d '{
81
+ "model": "gemma-2b-it",
82
+ "messages": [{"role": "user", "content": "Say this is a test!"}],
83
+ "temperature": 0.0
84
+ }'
85
+ ```
86
+
87
+ If you don't want the chatbot and you only want to run the server:
88
+
89
+ ```
90
+ ./gemma-2-2b-it.Q6_K.llamafile --server --nobrowser --host 0.0.0.0
91
+ ```
92
+
93
+ An advanced CLI mode is provided that's useful for shell scripting. You
94
+ can use it by passing the `--cli` flag. For additional help on how it
95
+ may be used, pass the `--help` flag.
96
+
97
+ ```
98
+ ./gemma-2-2b-it.Q6_K.llamafile --cli -p 'four score and seven' --log-disable
99
  ```
100
 
101
+ You then need to fill out the prompt / history template (see below).
102
+
103
+ For further information, please see the [llamafile
104
+ README](https://github.com/mozilla-ocho/llamafile/).
105
+
106
+ ## Context Window
107
+
108
+ This model has a max context window size of 8k tokens. By default, a
109
+ context window size of 8192 tokens is used. You may limit the context
110
+ window size by passing the `-c N` flag.
111
+
112
+ ## GPU Acceleration
113
+
114
+ On GPUs with sufficient RAM, the `-ngl 999` flag may be passed to use
115
+ the system's NVIDIA or AMD GPU(s). On Windows, only the graphics card
116
+ driver needs to be installed if you own an NVIDIA GPU. On Windows, if
117
+ you have an AMD GPU, you should install the ROCm SDK v6.1 and then pass
118
+ the flags `--recompile --gpu amd` the first time you run your llamafile.
119
+
120
+ On NVIDIA GPUs, by default, the prebuilt tinyBLAS library is used to
121
+ perform matrix multiplications. This is open source software, but it
122
+ doesn't go as fast as closed source cuBLAS. If you have the CUDA SDK
123
+ installed on your system, then you can pass the `--recompile` flag to
124
+ build a GGML CUDA library just for your system that uses cuBLAS. This
125
+ ensures you get maximum performance.
126
+
127
+ For further information, please see the [llamafile
128
+ README](https://github.com/mozilla-ocho/llamafile/).
129
+
130
  ## About llamafile
131
 
132
+ llamafile is a new format introduced by Mozilla on Nov 20th 2023. It
133
+ uses Cosmopolitan Libc to turn LLM weights into runnable llama.cpp
134
  binaries that run on the stock installs of six OSes for both ARM64 and
135
  AMD64.
136
 
 
156
  The 9B and 27B models were released a month earlier than 2B, so they're
157
  packaged with an slightly older version of the llamafile software.
158
 
159
+ ## License
160
+
161
+ The llamafile software is open source and permissively licensed. However
162
+ the weights embedded inside the llamafiles are governed by Google's
163
+ Gemma License and Gemma Prohibited Use Policy. This is not an open
164
+ source license. It's about as restrictive as it gets. There's a great
165
+ many things you're not allowed to do with Gemma. The terms of the
166
+ license and its list of unacceptable uses can be changed by Google at
167
+ any time. Therefore we wouldn't recommend using these llamafiles for
168
+ anything other than evaluating the quality of Google's engineering.
169
+
170
+ See the [LICENSE](LICENSE) file for further details.
171
+
172
  ---
173
 
174
  # Gemma 2 model card