antonio commited on
Commit
f360c2c
ยท
1 Parent(s): 1804441

Add comprehensive OLLAMA_README for Ollama users

Browse files

- Step-by-step setup guide with Modelfile requirement
- Explains bilingual behavior configuration
- Recommended settings for Raspberry Pi
- Complete examples and use cases

Files changed (1) hide show
  1. OLLAMA_README.md +197 -0
OLLAMA_README.md ADDED
@@ -0,0 +1,197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Gemma3 Smart Q4 โ€” Bilingual Offline AI for Raspberry Pi
2
+
3
+ **Quantized Gemma 3 1B optimized for edge devices. Fully offline, bilingual (Italian/English), privacy-first.**
4
+
5
+ ---
6
+
7
+ ## ๐Ÿš€ Quick Start
8
+
9
+ **IMPORTANT**: To enable bilingual behavior, you must create a Modelfile with the bilingual SYSTEM prompt.
10
+
11
+ ### Step 1: Pull the base model
12
+
13
+ ```bash
14
+ # Pull Q4_0 (recommended - faster, smaller)
15
+ ollama pull antconsales/antonio-gemma3-smart-q4
16
+
17
+ # Or pull Q4_K_M variant (better quality for long conversations)
18
+ ollama pull antconsales/antonio-gemma3-smart-q4:q4_k_m
19
+ ```
20
+
21
+ ### Step 2: Create Modelfile with bilingual configuration
22
+
23
+ ```bash
24
+ cat > Modelfile <<'EOF'
25
+ FROM antconsales/antonio-gemma3-smart-q4
26
+
27
+ PARAMETER temperature 0.7
28
+ PARAMETER top_p 0.9
29
+ PARAMETER num_ctx 1024
30
+ PARAMETER num_thread 4
31
+ PARAMETER num_batch 32
32
+ PARAMETER repeat_penalty 1.05
33
+ PARAMETER stop "<end_of_turn>"
34
+ PARAMETER stop "</s>"
35
+
36
+ SYSTEM """You are an offline AI assistant running on a Raspberry Pi. You MUST detect the user's language and respond in the SAME language:
37
+
38
+ - If the user writes in Italian, respond ONLY in Italian
39
+ - If the user writes in English, respond ONLY in English
40
+
41
+ Sei un assistente AI offline su Raspberry Pi. DEVI rilevare la lingua dell'utente e rispondere nella STESSA lingua:
42
+
43
+ - Se l'utente scrive in italiano, rispondi SOLO in italiano
44
+ - Se l'utente scrive in inglese, rispondi SOLO in inglese
45
+
46
+ Always match the user's language choice."""
47
+ EOF
48
+ ```
49
+
50
+ ### Step 3: Create the configured model
51
+
52
+ ```bash
53
+ ollama create gemma3-bilingual -f Modelfile
54
+ ```
55
+
56
+ ### Step 4: Run it!
57
+
58
+ ```bash
59
+ ollama run gemma3-bilingual
60
+
61
+ # Test in Italian
62
+ >>> ciao! come va?
63
+
64
+ # Test in English
65
+ >>> hello! how are you?
66
+ ```
67
+
68
+ **Why this is needed**: The base model is instruction-tuned but doesn't automatically switch languages. The SYSTEM prompt explicitly tells it to match the user's language.
69
+
70
+ ## โœจ Features
71
+
72
+ - ๐Ÿ”’ **100% Offline** โ€” No cloud, no tracking, no internet required
73
+ - ๐Ÿ—ฃ๏ธ **Bilingual** โ€” Automatically detects and responds in Italian or English
74
+ - โšก **Fast** โ€” 3.67 tokens/s on Raspberry Pi 4 (Q4_0)
75
+ - ๐ŸŽฏ **Optimized** โ€” Tuned parameters for Pi 4/5 hardware
76
+ - ๐Ÿ” **Privacy-First** โ€” All inference on-device
77
+
78
+ ## ๐Ÿ“Š Benchmarks (Raspberry Pi 4, 4GB RAM)
79
+
80
+ | Model | Speed | Size | Use Case |
81
+ |-------|-------|------|----------|
82
+ | **Q4_0** โญ | **3.67 t/s** | 720 MB | Default choice (faster, smaller) |
83
+ | **Q4_K_M** | 3.56 t/s | 806 MB | Better coherence in long conversations |
84
+
85
+ **Tested on**: Raspberry Pi OS (Debian Bookworm), Ollama runtime
86
+
87
+ ## ๐Ÿ’ฌ Example Interactions
88
+
89
+ Once you've created the model with the Modelfile (see Quick Start above):
90
+
91
+ ### Italian
92
+ ```bash
93
+ ollama run gemma3-bilingual "Ciao! Spiegami cos'รจ un sensore di prossimitร ."
94
+ ```
95
+
96
+ ### English
97
+ ```bash
98
+ ollama run gemma3-bilingual "What is a Raspberry Pi and what can I do with it?"
99
+ ```
100
+
101
+ ### Code-switching (IT/EN mixed)
102
+ ```bash
103
+ ollama run gemma3-bilingual "Explain GPIO in English, poi dimmi come usarlo in italiano"
104
+ ```
105
+
106
+ The model automatically detects the language and responds appropriately **when using the Modelfile configuration**!
107
+
108
+ ## ๐ŸŽฏ Use Cases
109
+
110
+ - **Privacy-first personal assistants** โ€” All inference on-device
111
+ - **Offline home automation** โ€” Control IoT without cloud dependencies
112
+ - **Voice assistants** โ€” Fast enough for real-time speech (3.67 t/s)
113
+ - **Educational Pi projects** โ€” Learn AI/ML on affordable hardware
114
+ - **Bilingual chatbots** โ€” IT/EN customer support, documentation
115
+ - **Embedded systems** โ€” Industrial applications requiring offline inference
116
+
117
+ ## โš™๏ธ Recommended Settings (Raspberry Pi 4/5)
118
+
119
+ For **optimal performance**, use these parameters in your Modelfile:
120
+
121
+ ```dockerfile
122
+ FROM antconsales/antonio-gemma3-smart-q4
123
+
124
+ PARAMETER num_ctx 1024 # Context length (512 for faster response, 1024 for longer conversations)
125
+ PARAMETER num_thread 4 # Utilize all 4 cores on Raspberry Pi 4
126
+ PARAMETER num_batch 32 # Optimized for throughput on Pi
127
+ PARAMETER temperature 0.7 # Balanced creativity vs consistency
128
+ PARAMETER top_p 0.9 # Nucleus sampling for diverse responses
129
+ PARAMETER repeat_penalty 1.05 # Reduces repetitive outputs
130
+ PARAMETER stop "<end_of_turn>"
131
+ PARAMETER stop "</s>"
132
+
133
+ SYSTEM """
134
+ You are an offline AI assistant running on a Raspberry Pi. Automatically detect the user's language (Italian or English) and respond in the same language. Be concise, practical, and helpful.
135
+
136
+ Sei un assistente AI offline che opera su Raspberry Pi. Rileva automaticamente la lingua dell'utente (italiano o inglese) e rispondi nella stessa lingua. Sii conciso, pratico e utile.
137
+ """
138
+ ```
139
+
140
+ **For voice assistants** or **real-time chat**, reduce `num_ctx` to `512` for faster responses.
141
+
142
+ ## ๐Ÿ› ๏ธ Technical Details
143
+
144
+ - **Base Model**: [Google Gemma 3 1B IT](https://huggingface.co/google/gemma-3-1b-it)
145
+ - **Quantization**: Q4_0 and Q4_K_M (llama.cpp)
146
+ - **Context Length**: 1024 tokens (configurable down to 512)
147
+ - **Vocabulary Size**: 262,144 tokens
148
+ - **Architecture**: Gemma3ForCausalLM
149
+ - **Supported Platforms**: Raspberry Pi 4/5, Mac M1/M2, Linux ARM64, x86-64
150
+
151
+ ## ๐Ÿ”’ Model Verification
152
+
153
+ Verify downloaded models using SHA256 checksums:
154
+
155
+ | File | SHA256 Checksum |
156
+ |------|----------------|
157
+ | `gemma3-1b-q4_0.gguf` | `d1d037446a2836db7666aa6ced3ce460b0f7f2ba61c816494a098bb816f2ad55` |
158
+ | `gemma3-1b-q4_k_m.gguf` | `c02d2e6f68fd34e9e66dff6a31d3f95fccb6db51f2be0b51f26136a85f7ec1f0` |
159
+
160
+ ```bash
161
+ # Verify checksum (on Linux/Mac with Ollama)
162
+ # Models are stored in ~/.ollama/models/blobs/
163
+ sha256sum ~/.ollama/models/blobs/sha256-*
164
+ ```
165
+
166
+ ## ๐Ÿ”— Links
167
+
168
+ - **Ollama**: https://ollama.com/antconsales/antonio-gemma3-smart-q4
169
+ - **HuggingFace**: https://huggingface.co/chill123/antonio-gemma3-smart-q4
170
+ - **GitHub** (demos, benchmarks, code): https://github.com/antconsales/gemma3-smart-q4
171
+
172
+ ## ๐Ÿ“œ License
173
+
174
+ This model is a **derivative work** of [Google's Gemma 3 1B](https://huggingface.co/google/gemma-3-1b-it).
175
+
176
+ **License**: Gemma License
177
+ Please review and comply with the [Gemma License Terms](https://ai.google.dev/gemma/terms) before using this model.
178
+
179
+ **Quantization, optimization, and bilingual configuration** by Antonio ([antconsales](https://github.com/antconsales)).
180
+
181
+ For licensing questions regarding the base model, refer to Google's official Gemma documentation.
182
+
183
+ ---
184
+
185
+ ## ๐Ÿ“ Version History
186
+
187
+ ### v0.1.0 (2025-10-21)
188
+ - Initial release
189
+ - Two quantizations: Q4_0 (720 MB) and Q4_K_M (806 MB)
190
+ - Bilingual IT/EN support with automatic language detection
191
+ - Optimized for Raspberry Pi 4 (3.56-3.67 tokens/s)
192
+ - Tested on Raspberry Pi OS (Debian Bookworm) with Ollama
193
+
194
+ ---
195
+
196
+ **Built with โค๏ธ for privacy and edge computing**
197
+ *Empowering offline AI, one Raspberry Pi at a time.* ๐Ÿ‡ฎ๐Ÿ‡น