macmacmacmac commited on
Commit
d7d68cb
·
verified ·
1 Parent(s): 6dae0d0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +249 -0
README.md ADDED
@@ -0,0 +1,249 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - litert
8
+ - litert-lm
9
+ - gemma
10
+ - agent
11
+ - tool-calling
12
+ - multimodal
13
+ - on-device
14
+ library_name: litert-lm
15
+ ---
16
+
17
+ # Agent Gemma 3n E2B (LiteRT-LM Fixed)
18
+
19
+ This is a **fixed and working version** of the Gemma 3n E2B Agent model in LiteRT-LM format (.litertlm). The original model had a corrupted tokenizer configuration that prevented it from loading. This version has been rebuilt with a working SentencePiece tokenizer while preserving all agent capabilities.
20
+
21
+ ## Model Details
22
+
23
+ - **Base Model**: Gemma 3n E2B
24
+ - **Format**: LiteRT-LM v1.4.0
25
+ - **Quantization**: INT4
26
+ - **Size**: ~3.2GB
27
+ - **Capabilities**:
28
+ - Text generation
29
+ - Tool/function calling (via Jinja template)
30
+ - Multimodal (vision and audio support)
31
+ - On-device inference optimized
32
+
33
+ ## What Was Fixed
34
+
35
+ The original agent-gemma model (`gemma-3n-E2B-it-agent-tools.litertlm`) contained a corrupted HuggingFace tokenizer JSON configuration that caused the following error when loading:
36
+
37
+ ```
38
+ thread '<unnamed>' panicked at external/tokenizers_cpp/rust/src/lib.rs:26:50:
39
+ called `Result::unwrap()` on an `Err` value: Error("expected value", line: 2, column: 1)
40
+ ```
41
+
42
+ ### Root Cause
43
+
44
+ During manual extraction and repacking of the .litertlm file using C++ peek/writer tools, the HuggingFace tokenizer's JSON metadata became malformed.
45
+
46
+ ### Solution
47
+
48
+ 1. **Extracted all model sections** from the corrupted agent-gemma model:
49
+ - LlmMetadata (including Agent Gemma Jinja template)
50
+ - 7 TFLite model components (embedder, per-layer embedder, audio encoder, vision encoder, etc.)
51
+
52
+ 2. **Replaced the tokenizer**: Extracted the working SentencePiece tokenizer from the standard gemma-3n-E2B model
53
+
54
+ 3. **Rebuilt the model** using LiteRT-LM's official `litertlm_builder` tool with proper section alignment and metadata
55
+
56
+ ## Model Architecture
57
+
58
+ The model consists of 9 sections:
59
+
60
+ ```
61
+ Section 0: LlmMetadata (includes Jinja prompt template for tool calling)
62
+ Section 1: SentencePiece Tokenizer
63
+ Section 2: TFLite Embedder
64
+ Section 3: TFLite Per-Layer Embedder
65
+ Section 4: TFLite Audio Encoder (HW)
66
+ Section 5: TFLite End-of-Audio detector
67
+ Section 6: TFLite Vision Adapter
68
+ Section 7: TFLite Vision Encoder
69
+ Section 8: TFLite Prefill/Decode
70
+ ```
71
+
72
+ ## Agent Capabilities
73
+
74
+ This model includes a comprehensive Jinja template for tool/function calling that supports:
75
+
76
+ - Tool declarations
77
+ - Function calls with arguments
78
+ - Function responses
79
+ - Multi-turn conversations with tool interactions
80
+ - System/developer prompts
81
+ - Image inputs (via `<start_of_image>` tokens)
82
+
83
+ Example tool call format:
84
+ ```
85
+ <start_function_call>call:function_name{arg1:value1,arg2:value2}<end_function_call>
86
+ ```
87
+
88
+ ## Performance
89
+
90
+ Tested on CPU (no GPU acceleration):
91
+
92
+ - **Prefill Speed**: 21.20 tokens/sec
93
+ - **Decode Speed**: 11.44 tokens/sec
94
+ - **Time to First Token**: ~1.6s
95
+ - **Initialization**: ~4.7s
96
+
97
+ ## Usage
98
+
99
+ ### Requirements
100
+
101
+ 1. **LiteRT-LM runtime** - Build from source:
102
+ ```bash
103
+ git clone https://github.com/google-ai-edge/LiteRT.git
104
+ cd LiteRT/LiteRT-LM
105
+ bazel build -c opt //runtime/engine:litert_lm_main
106
+ ```
107
+
108
+ 2. **Supported platforms**: Linux (clang), macOS, Android
109
+
110
+ ### Running the Model
111
+
112
+ ```bash
113
+ # Basic inference
114
+ ./bazel-bin/runtime/engine/litert_lm_main \
115
+ --model_path=gemma-3n-E2B-it-agent-fixed.litertlm \
116
+ --backend=cpu \
117
+ --input_prompt="Hello, how are you?"
118
+
119
+ # With GPU acceleration (if available)
120
+ ./bazel-bin/runtime/engine/litert_lm_main \
121
+ --model_path=gemma-3n-E2B-it-agent-fixed.litertlm \
122
+ --backend=gpu \
123
+ --input_prompt="Write a function to calculate fibonacci numbers"
124
+ ```
125
+
126
+ ### Example Output
127
+
128
+ ```
129
+ input_prompt: Hello, how are you today?
130
+ I am doing well, thank you for asking! As a large language model, I don't
131
+ experience emotions like humans do, but I'm functioning optimally and ready
132
+ to assist you. How can I help you today?<end_of_turn>
133
+ ```
134
+
135
+ ## Building the Fixed Model (Technical Details)
136
+
137
+ If you need to rebuild or modify the model, here's the process:
138
+
139
+ ### 1. Extract Sections
140
+
141
+ ```python
142
+ #!/usr/bin/env python3
143
+ import os
144
+
145
+ def extract_section(input_file, start, end, output_file):
146
+ with open(input_file, 'rb') as f:
147
+ f.seek(start)
148
+ data = f.read(end - start)
149
+ with open(output_file, 'wb') as f:
150
+ f.write(data)
151
+
152
+ # Extract from agent model (all sections except tokenizer)
153
+ agent_model = "gemma-3n-E2B-it-agent-tools.litertlm"
154
+ extract_section(agent_model, 16384, 23334, "metadata.pb")
155
+ extract_section(agent_model, 2293760, 273878864, "embedder.tflite")
156
+ # ... (extract remaining TFLite sections)
157
+
158
+ # Extract working tokenizer from standard gemma model
159
+ working_model = "gemma-3n-E2B-it-int4.litertlm"
160
+ extract_section(working_model, 32768, 4716087, "tokenizer.model")
161
+ ```
162
+
163
+ ### 2. Create TOML Configuration
164
+
165
+ ```toml
166
+ [system_metadata]
167
+ entries = [
168
+ { key = "author", value_type = "String", value = "The ODML Authors" }
169
+ ]
170
+
171
+ [[section]]
172
+ section_type = "LlmMetadata"
173
+ data_path = "metadata.pb"
174
+
175
+ [[section]]
176
+ section_type = "SP_Tokenizer"
177
+ data_path = "tokenizer.model"
178
+
179
+ [[section]]
180
+ section_type = "TFLiteModel"
181
+ model_type = "EMBEDDER"
182
+ data_path = "embedder.tflite"
183
+
184
+ # ... (add remaining sections)
185
+ ```
186
+
187
+ ### 3. Build with litertlm_builder
188
+
189
+ ```bash
190
+ bazel run //schema/py:litertlm_builder_cli -- \
191
+ toml --path config.toml \
192
+ output --path gemma-3n-E2B-it-agent-fixed.litertlm
193
+ ```
194
+
195
+ ## Verification
196
+
197
+ Check the model structure:
198
+
199
+ ```bash
200
+ bazel run //schema/cc:litertlm_peek -- \
201
+ --litertlm_file=gemma-3n-E2B-it-agent-fixed.litertlm
202
+ ```
203
+
204
+ Expected output shows:
205
+ - Version: 1.4.0
206
+ - Section 1: `AnySectionDataType_SP_Tokenizer` (not HF_Tokenizer)
207
+ - 9 total sections with proper alignment
208
+
209
+ ## Known Issues & Limitations
210
+
211
+ 1. **Tokenizer Change**: This model uses SentencePiece instead of the original HuggingFace tokenizer. While functionally equivalent for Gemma models, there may be minor differences in special token handling.
212
+
213
+ 2. **No Agent Template Customization**: The Jinja template from the original model is preserved as-is. If you need to modify the tool-calling behavior, you'll need to:
214
+ - Extract the metadata.pb
215
+ - Modify the `jinja_prompt_template` field
216
+ - Rebuild the model
217
+
218
+ 3. **Hardware Requirements**:
219
+ - Minimum 4GB RAM recommended
220
+ - GPU acceleration requires OpenGL ES 3.1+ or Metal support
221
+ - Audio/vision features require additional hardware support
222
+
223
+ ## License
224
+
225
+ This model inherits the Gemma license from the original model. The fixing/rebuilding process does not change the model weights or training data.
226
+
227
+ ## Citation
228
+
229
+ If you use this model, please cite:
230
+
231
+ ```bibtex
232
+ @misc{gemma3n-agent-fixed,
233
+ title={Agent Gemma 3n E2B (LiteRT-LM Fixed)},
234
+ author={kontextdev},
235
+ year={2025},
236
+ publisher={HuggingFace},
237
+ howpublished={\url{https://huggingface.co/kontextdev/agent-gemma}}
238
+ }
239
+ ```
240
+
241
+ ## Related Links
242
+
243
+ - [LiteRT-LM GitHub](https://github.com/google-ai-edge/LiteRT/tree/main/LiteRT-LM)
244
+ - [Original Gemma Model](https://ai.google.dev/gemma)
245
+ - [LiteRT Documentation](https://ai.google.dev/edge/litert)
246
+
247
+ ## Changelog
248
+
249
+ - **v1.0 (2025-01-14)**: Initial release with fixed SentencePiece tokenizer