ovinduG commited on
Commit
727cb3d
Β·
verified Β·
1 Parent(s): c78bec2

Add model card

Browse files
Files changed (1) hide show
  1. README.md +279 -186
README.md CHANGED
@@ -1,199 +1,292 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  library_name: transformers
3
- tags: []
4
  ---
5
 
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
 
 
11
 
12
  ## Model Details
13
 
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a πŸ€— transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
 
76
  ## Training Details
77
 
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
 
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
 
199
- [More Information Needed]
 
1
  ---
2
+ license: gemma
3
+ base_model: google/functiongemma-270m-it
4
+ tags:
5
+ - text-classification
6
+ - domain-classification
7
+ - function-calling
8
+ - peft
9
+ - lora
10
+ - gemma
11
+ - functiongemma
12
+ datasets:
13
+ - custom
14
+ language:
15
+ - en
16
+ metrics:
17
+ - accuracy
18
+ - f1
19
  library_name: transformers
20
+ pipeline_tag: text-classification
21
  ---
22
 
23
+ # FunctionGemma Domain Classifier
 
 
 
24
 
25
+ Fine-tuned **FunctionGemma-270M** for multi-domain query classification using LoRA.
26
 
27
  ## Model Details
28
 
29
+ - **Base Model:** [google/functiongemma-270m-it](https://huggingface.co/google/functiongemma-270m-it)
30
+ - **Model Size:** 270M parameters (540MB)
31
+ - **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
32
+ - **Trainable Parameters:** ~7.6M (2.75%)
33
+ - **Training Time:** 23.3 minutes
34
+ - **Hardware:** GPU (memory optimized for <5GB VRAM)
35
+
36
+ ## Performance
37
+
38
+ ```
39
+ Accuracy: 95.51%
40
+ F1 Score (Weighted): 0.96
41
+ F1 Score (Macro): 0.88
42
+ Training Loss: 0.3
43
+ ```
44
+
45
+ ## Supported Domains (17)
46
+
47
+ 1. ambiguous
48
+ 2. api_generation
49
+ 3. business
50
+ 4. coding
51
+ 5. creative_content
52
+ 6. data_analysis
53
+ 7. education
54
+ 8. general_knowledge
55
+ 9. geography
56
+ 10. history
57
+ 11. law
58
+ 12. literature
59
+ 13. mathematics
60
+ 14. medicine
61
+ 15. science
62
+ 16. sensitive
63
+ 17. technology
64
+
65
+ ## Use Cases
66
+
67
+ - **Query Routing:** Route user queries to specialized models/services
68
+ - **Content Classification:** Categorize text by domain
69
+ - **Multi-domain Detection:** Identify queries spanning multiple domains
70
+ - **Intent Analysis:** Understand query context and domain
71
+
72
+ ## Quick Start
73
+
74
+ ### Installation
75
+
76
+ ```bash
77
+ pip install transformers peft torch
78
+ ```
79
+
80
+ ### Inference
81
+
82
+ ```python
83
+ from transformers import AutoTokenizer, AutoModelForCausalLM
84
+ from peft import PeftModel
85
+ import torch
86
+ import json
87
+
88
+ # Load model
89
+ base_model = AutoModelForCausalLM.from_pretrained(
90
+ "google/functiongemma-270m-it",
91
+ torch_dtype=torch.bfloat16,
92
+ device_map="auto"
93
+ )
94
+ model = PeftModel.from_pretrained(base_model, "ovinduG/functiongemma-domain-classifier")
95
+ tokenizer = AutoTokenizer.from_pretrained("ovinduG/functiongemma-domain-classifier")
96
+
97
+ # Classify a query
98
+ def classify(text):
99
+ # Define function schema
100
+ function_def = {
101
+ "type": "function",
102
+ "function": {
103
+ "name": "classify_query_domain",
104
+ "description": "Classify query into domains",
105
+ "parameters": {
106
+ "type": "object",
107
+ "properties": {
108
+ "primary_domain": {"type": "string"},
109
+ "primary_confidence": {"type": "number"},
110
+ "is_multi_domain": {"type": "boolean"},
111
+ "secondary_domains": {"type": "array"}
112
+ }
113
+ }
114
+ }
115
+ }
116
+
117
+ messages = [
118
+ {"role": "developer", "content": "You are a model that can do function calling"},
119
+ {"role": "user", "content": text}
120
+ ]
121
+
122
+ inputs = tokenizer.apply_chat_template(
123
+ messages,
124
+ tools=[function_def],
125
+ add_generation_prompt=True,
126
+ return_dict=True,
127
+ return_tensors="pt"
128
+ ).to(model.device)
129
+
130
+ with torch.no_grad():
131
+ outputs = model.generate(
132
+ **inputs,
133
+ max_new_tokens=150,
134
+ do_sample=False,
135
+ pad_token_id=tokenizer.eos_token_id
136
+ )
137
+
138
+ response = tokenizer.decode(
139
+ outputs[0][inputs["input_ids"].shape[-1]:],
140
+ skip_special_tokens=True
141
+ )
142
+
143
+ # Parse function call
144
+ if "{" in response:
145
+ start = response.find("{")")
146
+ end = response.rfind("}") + 1
147
+ return json.loads(response[start:end])
148
+
149
+ return {"error": "Failed to parse response"}
150
+
151
+ # Example
152
+ result = classify("Write a Python function to calculate fibonacci numbers")
153
+ print(json.dumps(result, indent=2))
154
+ ```
155
+
156
+ ### Example Output
157
+
158
+ ```json
159
+ {
160
+ "primary_domain": "coding",
161
+ "primary_confidence": 0.95,
162
+ "is_multi_domain": false,
163
+ "secondary_domains": []
164
+ }
165
+ ```
166
+
167
+ ### Multi-Domain Example
168
+
169
+ ```python
170
+ result = classify("Build an ML model to predict customer churn and create REST API endpoints")
171
+ print(json.dumps(result, indent=2))
172
+ ```
173
+
174
+ ```json
175
+ {
176
+ "primary_domain": "data_analysis",
177
+ "primary_confidence": 0.85,
178
+ "is_multi_domain": true,
179
+ "secondary_domains": [
180
+ {
181
+ "domain": "api_generation",
182
+ "confidence": 0.75
183
+ }
184
+ ]
185
+ }
186
+ ```
187
 
188
  ## Training Details
189
 
190
+ ### Dataset
191
+
192
+ - **Total Samples:** 5,046
193
+ - **Training Samples:** 3,666
194
+ - **Validation Samples:** 690
195
+ - **Test Samples:** 690
196
+ - **Multi-domain Queries:** 546 (10.8%)
197
+
198
+ ### Training Configuration
199
+
200
+ ```python
201
+ # LoRA Configuration
202
+ r = 32
203
+ lora_alpha = 64
204
+ lora_dropout = 0.05
205
+ target_modules = ['q_proj', 'v_proj', 'k_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']
206
+
207
+ # Training Configuration
208
+ num_epochs = 5
209
+ batch_size = 4
210
+ gradient_accumulation_steps = 8
211
+ learning_rate = 0.0003
212
+ max_length = 1024
213
+ optimizer = "adamw_8bit" # Memory optimized
214
+ ```
215
+
216
+ ### Memory Optimization
217
+
218
+ This model was trained with memory optimizations to run on GPUs with <5GB VRAM:
219
+
220
+ - **8-bit Optimizer:** Reduces optimizer memory by 50%
221
+ - **Gradient Checkpointing:** Trades compute for memory
222
+ - **Smaller Batches:** 4 samples per batch with gradient accumulation
223
+ - **Shorter Sequences:** 1024 tokens max (vs 2048)
224
+
225
+ **Total VRAM Usage:** ~4GB (vs ~40GB without optimization)
226
+
227
+ ## Performance by Domain
228
+
229
+ | Domain | Precision | Recall | F1-Score | Support |
230
+ |--------|-----------|--------|----------|---------|
231
+ | ambiguous | 0.98 | 1.00 | 0.99 | 45 |
232
+ | api_generation | 0.98 | 1.00 | 0.99 | 45 |
233
+ | business | 0.98 | 0.93 | 0.95 | 44 |
234
+ | coding | 0.98 | 0.96 | 0.97 | 48 |
235
+ | creative_content | 0.90 | 1.00 | 0.95 | 45 |
236
+ | data_analysis | 0.96 | 0.98 | 0.97 | 46 |
237
+ | education | 0.98 | 0.96 | 0.97 | 45 |
238
+ | general_knowledge | 0.76 | 0.84 | 0.80 | 45 |
239
+ | law | 0.98 | 0.94 | 0.96 | 49 |
240
+ | literature | 1.00 | 0.93 | 0.97 | 45 |
241
+ | mathematics | 1.00 | 1.00 | 1.00 | 47 |
242
+ | medicine | 0.98 | 0.89 | 0.93 | 46 |
243
+ | science | 1.00 | 0.98 | 0.99 | 47 |
244
+ | sensitive | 0.92 | 1.00 | 0.96 | 45 |
245
+ | technology | 1.00 | 0.93 | 0.97 | 46 |
246
+
247
+ **Overall Accuracy:** 95.51%
248
+
249
+ ## Advantages
250
+
251
+ - βœ… **Tiny Size:** 270M parameters (14x smaller than Phi-3)
252
+ - βœ… **Fast Inference:** 0.3s on CPU, 0.08s on GPU
253
+ - βœ… **Low Memory:** Runs on 4GB VRAM
254
+ - βœ… **High Accuracy:** 95.51% (competitive with larger models)
255
+ - βœ… **Multi-domain:** Detects queries spanning multiple domains
256
+ - βœ… **Function Calling:** Built-in structured output
257
+ - βœ… **Mobile-Ready:** Can deploy on smartphones
258
+
259
+ ## Limitations
260
+
261
+ - Trained on English queries only
262
+ - Performance varies by domain (see table above)
263
+ - May struggle with highly ambiguous queries
264
+ - Limited to 17 pre-defined domains
265
+
266
+ ## Citation
267
+
268
+ If you use this model, please cite:
269
+
270
+ ```bibtex
271
+ @misc{functiongemma-domain-classifier,
272
+ author = {ovinduG},
273
+ title = {FunctionGemma Domain Classifier},
274
+ year = {2024},
275
+ publisher = {HuggingFace},
276
+ howpublished = {\url{https://huggingface.co/ovinduG/functiongemma-domain-classifier}}
277
+ }
278
+ ```
279
+
280
+ ## License
281
+
282
+ This model is based on [FunctionGemma](https://huggingface.co/google/functiongemma-270m-it) and follows the same [Gemma License](https://ai.google.dev/gemma/terms).
283
+
284
+ ## Acknowledgments
285
+
286
+ - **Base Model:** Google's FunctionGemma-270M
287
+ - **Training Framework:** HuggingFace Transformers + PEFT
288
+ - **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
289
 
290
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
291
 
292
+ Built with ❀️ using FunctionGemma