manikumargouni commited on
Commit
ccabc1b
·
verified ·
1 Parent(s): 41aaef9

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitignore +1 -2
  2. README.md +440 -156
  3. model.safetensors +3 -0
.gitignore CHANGED
@@ -6,5 +6,4 @@ decision_phase_model_output/
6
  subtype_model_output/
7
  iab_model_output/
8
  */model.safetensors
9
- iab_hierarchy_model_output/
10
- model.safetensors
 
6
  subtype_model_output/
7
  iab_model_output/
8
  */model.safetensors
9
+ iab_hierarchy_model_output/
 
README.md CHANGED
@@ -1,208 +1,492 @@
1
- ---
2
- language:
3
- - en
4
- library_name: transformers
5
- pipeline_tag: text-classification
6
- base_model: distilbert-base-uncased
7
- metrics:
8
- - accuracy
9
- - f1
10
- tags:
11
- - intent-classification
12
- - multitask
13
- - iab
14
- - conversational-ai
15
- - adtech
16
- - calibrated-confidence
17
- license: apache-2.0
18
- ---
19
-
20
- # admesh/agentic-intent-classifier
21
-
22
- Production-ready intent + IAB classifier bundle for conversational traffic.
23
-
24
- Combines multitask intent modeling, supervised IAB content classification, and per-head confidence calibration to support safe monetization decisions in real time.
25
-
26
- ## Links
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- - Hugging Face: https://huggingface.co/admesh/agentic-intent-classifier
29
- - GitHub: https://github.com/GouniManikumar12/agentic-intent-classifier
30
-
31
- ## What It Predicts
32
 
33
- | Field | Description |
34
- |---|---|
35
- | `intent.type` | `commercial`, `informational`, `navigational`, `transactional`, … |
36
- | `intent.subtype` | `product_discovery`, `comparison`, `how_to`, … |
37
- | `intent.decision_phase` | `awareness`, `consideration`, `decision`, … |
38
- | `iab_content` | IAB Content Taxonomy 3.0 tier1 / tier2 / tier3 labels |
39
- | `component_confidence` | Per-head calibrated confidence with threshold flags |
40
- | `system_decision` | Monetization eligibility, opportunity type, policy |
41
 
42
- ---
 
 
43
 
44
- ## Deployment Options
 
 
 
 
45
 
46
- ### 1. `transformers.pipeline()` one line anywhere
47
 
48
  ```python
49
- from transformers import pipeline
50
 
51
- clf = pipeline(
52
- "admesh-intent",
53
- model="admesh/agentic-intent-classifier",
54
- trust_remote_code=True,
55
- )
56
-
57
- result = clf("Which laptop should I buy for college?")
58
  ```
59
 
60
- Batch and custom thresholds:
61
 
62
  ```python
63
- # batch
64
  results = clf([
65
  "Best running shoes under $100",
66
- "How does TCP work?",
67
  "Buy noise-cancelling headphones",
68
  ])
69
 
70
- # custom confidence thresholds
71
  result = clf(
72
- "Buy headphones",
73
  threshold_overrides={"intent_type": 0.6, "intent_subtype": 0.35},
74
  )
75
  ```
76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
  ---
78
 
79
- ### 2. HF Inference Endpoints (managed, deploy to AWS / Azure / GCP)
80
 
81
- 1. Go to https://ui.endpoints.huggingface.co
82
- 2. **New Endpoint** → select `admesh/agentic-intent-classifier`
83
- 3. Framework: **PyTorch** — Task: **Text Classification**
84
- 4. Enable **"Load with trust_remote_code"**
85
- 5. Deploy
 
 
86
 
87
- The endpoint serves the same `pipeline()` interface above via REST:
88
 
89
  ```bash
90
- curl https://<your-endpoint>.endpoints.huggingface.cloud \
91
- -H "Authorization: Bearer $HF_TOKEN" \
92
- -H "Content-Type: application/json" \
93
- -d '{"inputs": "Which laptop should I buy for college?"}'
94
  ```
95
 
96
- ---
 
 
 
 
 
97
 
98
- ### 3. HF Spaces (Gradio / Streamlit demo)
99
 
100
- ```python
101
- # app.py for a Gradio Space
102
- import gradio as gr
103
- from transformers import pipeline
104
-
105
- clf = pipeline(
106
- "admesh-intent",
107
- model="admesh/agentic-intent-classifier",
108
- trust_remote_code=True,
109
- )
110
 
111
- def classify(text):
112
- return clf(text)
113
 
114
- gr.Interface(fn=classify, inputs="text", outputs="json").launch()
 
 
 
115
  ```
116
 
117
- ---
118
 
119
- ### 4. Local / notebook via `snapshot_download`
 
 
 
120
 
121
- ```python
122
- import sys
123
- from huggingface_hub import snapshot_download
124
 
125
- local_dir = snapshot_download(
126
- repo_id="admesh/agentic-intent-classifier",
127
- repo_type="model",
128
- )
129
- sys.path.insert(0, local_dir)
130
 
131
- from pipeline import AdmeshIntentPipeline
132
- clf = AdmeshIntentPipeline()
133
- result = clf("I need a CRM for a 5-person startup")
 
 
 
 
134
  ```
135
 
136
- Or the one-liner factory:
137
 
138
- ```python
139
- from pipeline import AdmeshIntentPipeline
140
- clf = AdmeshIntentPipeline.from_pretrained("admesh/agentic-intent-classifier")
 
 
 
 
 
 
 
 
 
 
141
  ```
142
 
143
- ---
144
 
145
- ## Example Output
146
-
147
- ```json
148
- {
149
- "model_output": {
150
- "classification": {
151
- "iab_content": {
152
- "taxonomy": "IAB Content Taxonomy",
153
- "taxonomy_version": "3.0",
154
- "tier1": {"id": "552", "label": "Style & Fashion"},
155
- "tier2": {"id": "579", "label": "Men's Fashion"},
156
- "mapping_mode": "exact",
157
- "mapping_confidence": 0.73
158
- },
159
- "intent": {
160
- "type": "commercial",
161
- "subtype": "product_discovery",
162
- "decision_phase": "consideration",
163
- "confidence": 0.9549,
164
- "commercial_score": 0.656
165
- }
166
- }
167
- },
168
- "system_decision": {
169
- "policy": {
170
- "monetization_eligibility": "allowed_with_caution",
171
- "eligibility_reason": "commercial_discovery_signal_present"
172
- },
173
- "opportunity": {"type": "soft_recommendation", "strength": "medium"}
174
- },
175
- "meta": {
176
- "system_version": "0.6.0-phase4",
177
- "calibration_enabled": true,
178
- "iab_mapping_is_placeholder": false
179
- }
180
- }
181
- ```
182
-
183
- ## Reproducible Revision
184
 
185
- ```python
186
- from huggingface_hub import snapshot_download
187
- local_dir = snapshot_download(
188
- repo_id="admesh/agentic-intent-classifier",
189
- repo_type="model",
190
- revision="0584798f8efee6beccd778b0afa06782ab5add60",
191
- )
192
  ```
193
 
194
- ## Included Artifacts
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
195
 
196
- | Path | Contents |
197
- |---|---|
198
- | `multitask_intent_model_output/` | DistilBERT multitask weights + tokenizer |
199
- | `iab_classifier_model_output/` | IAB content classifier weights + tokenizer |
200
- | `artifacts/calibration/` | Per-head temperature + threshold JSONs |
201
- | `pipeline.py` | `AdmeshIntentPipeline` (transformers.Pipeline subclass) |
202
- | `combined_inference.py` | Core inference logic |
203
 
204
- ## Notes
205
 
206
- - `trust_remote_code=True` is required because this model uses a custom multi-head architecture that does not map to a single standard `AutoModel` checkpoint.
207
- - `meta.iab_mapping_is_placeholder: true` means IAB artifacts were missing or skipped; train and calibrate IAB for full production accuracy.
208
- - For long-running servers, instantiate once and reuse — models are cached in memory after the first call.
 
 
1
+ # Agentic Intent Classifier
2
+
3
+ `agentic-intent-classifier` is a multi-head query classification stack for conversational traffic.
4
+
5
+ It currently produces:
6
+
7
+ - `intent.type`
8
+ - `intent.subtype`
9
+ - `intent.decision_phase`
10
+ - `iab_content`
11
+ - calibrated confidence per head
12
+ - combined fallback / policy / opportunity decisions
13
+
14
+ The repo is beyond the original v0.1 baseline. It now includes:
15
+
16
+ - shared config and label ownership
17
+ - reusable model runtime
18
+ - calibrated confidence and threshold gating
19
+ - combined inference with fallback/policy logic
20
+ - request/response validation in the demo API
21
+ - repeatable evaluation and regression suites
22
+ - full-TSV IAB taxonomy retrieval support through tier4
23
+ - a local embedding index for taxonomy-node retrieval over IAB content paths
24
+ - a separate synthetic full-intent-taxonomy augmentation dataset for non-IAB heads
25
+ - a dedicated intent-type difficulty dataset and held-out benchmark with `easy`, `medium`, and `hard` cases
26
+ - a dedicated decision-phase difficulty dataset and held-out benchmark with `easy`, `medium`, and `hard` cases
27
+
28
+ Generated model weights are intentionally not committed.
29
+
30
+ ## Current Taxonomy
31
+
32
+ ### `intent.type`
33
+
34
+ - `informational`
35
+ - `exploratory`
36
+ - `commercial`
37
+ - `transactional`
38
+ - `support`
39
+ - `personal_reflection`
40
+ - `creative_generation`
41
+ - `chit_chat`
42
+ - `ambiguous`
43
+ - `prohibited`
44
+
45
+ ### `intent.decision_phase`
46
+
47
+ - `awareness`
48
+ - `research`
49
+ - `consideration`
50
+ - `decision`
51
+ - `action`
52
+ - `post_purchase`
53
+ - `support`
54
+
55
+ ### `intent.subtype`
56
+
57
+ - `education`
58
+ - `product_discovery`
59
+ - `comparison`
60
+ - `evaluation`
61
+ - `deal_seeking`
62
+ - `provider_selection`
63
+ - `signup`
64
+ - `purchase`
65
+ - `booking`
66
+ - `download`
67
+ - `contact_sales`
68
+ - `task_execution`
69
+ - `onboarding_setup`
70
+ - `troubleshooting`
71
+ - `account_help`
72
+ - `billing_help`
73
+ - `follow_up`
74
+ - `emotional_reflection`
75
+
76
+ ### `iab_content`
77
+
78
+ - candidates are derived from every row in [data/iab-content/Content Taxonomy 3.0.tsv](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/data/iab-content/Content%20Taxonomy%203.0.tsv)
79
+ - retrieval output supports `tier1`, `tier2`, `tier3`, and optional `tier4`
80
+
81
+ ## What The System Does
82
+
83
+ - runs three classifier heads:
84
+ - `intent_type`
85
+ - `intent_subtype`
86
+ - `decision_phase`
87
+ - resolves `iab_content` through a local embedding index over taxonomy nodes plus generic label/path reranking
88
+ - applies calibration artifacts when present
89
+ - computes `commercial_score`
90
+ - applies fallback when confidence is too weak or policy-safe blocking is required
91
+ - emits a schema-validated combined envelope
92
+
93
+ ## What The System Does Not Do
94
+
95
+ - it is not a multi-turn memory system
96
+ - it is not a production-optimized low-latency serving path
97
+ - it is not yet trained on large real-traffic human-labeled intent data
98
+ - combined decision logic is still heuristic, even though it is materially stronger than the original baseline
99
+
100
+ ## Project Layout
101
+
102
+ - [config.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/config.py): labels, thresholds, artifact paths, model paths
103
+ - [model_runtime.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/model_runtime.py): shared calibrated inference runtime
104
+ - [combined_inference.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/combined_inference.py): composed system response
105
+ - [inference_intent_type.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/inference_intent_type.py): direct `intent_type` inference entrypoint
106
+ - [inference_iab_classifier.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/inference_iab_classifier.py): direct supervised `iab_content` inference entrypoint
107
+ - [schemas.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/schemas.py): request/response validation
108
+ - [demo_api.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/demo_api.py): local validated API
109
+ - [iab_taxonomy.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/iab_taxonomy.py): full taxonomy parser/index
110
+ - [iab_classifier.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/iab_classifier.py): supervised IAB runtime with taxonomy-aware parent fallback
111
+ - [iab_retrieval.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/iab_retrieval.py): optional shadow retrieval baseline
112
+ - [training/build_full_intent_taxonomy_dataset.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/training/build_full_intent_taxonomy_dataset.py): separate synthetic intent augmentation dataset
113
+ - [training/build_intent_type_difficulty_dataset.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/training/build_intent_type_difficulty_dataset.py): extra `intent_type` augmentation plus held-out difficulty benchmark
114
+ - [training/build_decision_phase_difficulty_dataset.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/training/build_decision_phase_difficulty_dataset.py): extra `decision_phase` augmentation plus held-out difficulty benchmark
115
+ - [training/build_subtype_difficulty_dataset.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/training/build_subtype_difficulty_dataset.py): extra `intent_subtype` augmentation plus held-out difficulty benchmark
116
+ - [training/build_subtype_dataset.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/training/build_subtype_dataset.py): subtype dataset generation from existing corpora
117
+ - [training/train_iab.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/training/train_iab.py): train the supervised IAB classifier head
118
+ - [training/build_iab_taxonomy_embeddings.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/training/build_iab_taxonomy_embeddings.py): build local IAB node embedding artifacts
119
+ - [training/run_full_training_pipeline.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/training/run_full_training_pipeline.py): full multi-head training/calibration/eval pipeline
120
+ - [evaluation/run_evaluation.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/evaluation/run_evaluation.py): repeatable benchmark runner
121
+ - [evaluation/run_regression_suite.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/evaluation/run_regression_suite.py): known-failure regression runner
122
+ - [evaluation/run_iab_mapping_suite.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/evaluation/run_iab_mapping_suite.py): IAB behavior-lock regression runner
123
+ - [evaluation/run_iab_quality_suite.py](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/evaluation/run_iab_quality_suite.py): curated IAB quality-target runner
124
+ - [known_limitations.md](/Users/manikumargouni/Desktop/AdMesh/protocol/agentic-intent-classifier/known_limitations.md): current gaps and caveats
125
+
126
+ ## Quickstart: Run From Hugging Face
127
+
128
+ Download the trained bundle and run inference in three lines — no local training required.
129
 
130
+ ```python
131
+ import sys
132
+ from huggingface_hub import snapshot_download
 
133
 
134
+ # Download the full bundle (models + calibration + code)
135
+ local_dir = snapshot_download(
136
+ repo_id="admesh/agentic-intent-classifier",
137
+ repo_type="model",
138
+ )
139
+ sys.path.insert(0, local_dir)
 
 
140
 
141
+ # Import and instantiate
142
+ from pipeline import AdmeshIntentPipeline
143
+ clf = AdmeshIntentPipeline()
144
 
145
+ # Classify
146
+ import json
147
+ result = clf("Which laptop should I buy for college?")
148
+ print(json.dumps(result, indent=2))
149
+ ```
150
 
151
+ Or use the one-liner factory method:
152
 
153
  ```python
154
+ from pipeline import AdmeshIntentPipeline # after sys.path.insert above
155
 
156
+ clf = AdmeshIntentPipeline.from_pretrained("admesh/agentic-intent-classifier")
157
+ result = clf("I need a CRM for a 5-person startup")
 
 
 
 
 
158
  ```
159
 
160
+ Batch mode and custom thresholds are also supported:
161
 
162
  ```python
163
+ # Batch
164
  results = clf([
165
  "Best running shoes under $100",
166
+ "How does gradient descent work?",
167
  "Buy noise-cancelling headphones",
168
  ])
169
 
170
+ # Custom confidence thresholds
171
  result = clf(
172
+ "Buy noise-cancelling headphones",
173
  threshold_overrides={"intent_type": 0.6, "intent_subtype": 0.35},
174
  )
175
  ```
176
 
177
+ Verify artifacts and run a smoke test from the CLI:
178
+
179
+ ```bash
180
+ cd "<local_dir>"
181
+ python3 training/pipeline_verify.py
182
+ python3 combined_inference.py "Which CRM should I buy for a 3-person startup?"
183
+ ```
184
+
185
+ Pin a specific revision for reproducibility:
186
+
187
+ ```python
188
+ local_dir = snapshot_download(
189
+ repo_id="admesh/agentic-intent-classifier",
190
+ repo_type="model",
191
+ revision="0584798f8efee6beccd778b0afa06782ab5add60",
192
+ )
193
+ ```
194
+
195
  ---
196
 
197
+ ## Setup (for local training)
198
 
199
+ ```bash
200
+ python3 -m venv .venv
201
+ source .venv/bin/activate
202
+ pip install -r agentic-intent-classifier/requirements.txt
203
+ ```
204
+
205
+ ## Inference (local training path)
206
 
207
+ Run one query locally:
208
 
209
  ```bash
210
+ cd agentic-intent-classifier
211
+ python3 training/train_iab.py
212
+ python3 training/calibrate_confidence.py --head iab_content
213
+ python3 combined_inference.py "Which CRM should I buy for a 3-person startup?"
214
  ```
215
 
216
+ Run only the `intent_type` head:
217
+
218
+ ```bash
219
+ cd agentic-intent-classifier
220
+ python3 inference_intent_type.py "best shoes under 100"
221
+ ```
222
 
223
+ Run the demo API:
224
 
225
+ ```bash
226
+ cd agentic-intent-classifier
227
+ python3 demo_api.py
228
+ ```
 
 
 
 
 
 
229
 
230
+ Example request:
 
231
 
232
+ ```bash
233
+ curl -sS -X POST http://127.0.0.1:8008/classify \
234
+ -H 'Content-Type: application/json' \
235
+ -d '{"text":"I cannot log into my account"}'
236
  ```
237
 
238
+ Infra endpoints:
239
 
240
+ ```bash
241
+ curl -sS http://127.0.0.1:8008/health
242
+ curl -sS http://127.0.0.1:8008/version
243
+ ```
244
 
245
+ Train only the IAB classifier head:
 
 
246
 
247
+ ```bash
248
+ cd agentic-intent-classifier
249
+ python3 training/train_iab.py
250
+ python3 training/calibrate_confidence.py --head iab_content
251
+ ```
252
 
253
+ The online `iab_content` path now uses the compact supervised classifier. Retrieval is still available as an optional shadow baseline.
254
+
255
+ Build the optional retrieval shadow index:
256
+
257
+ ```bash
258
+ cd agentic-intent-classifier
259
+ python3 training/build_iab_taxonomy_embeddings.py
260
  ```
261
 
262
+ By default the shadow retrieval path uses `Alibaba-NLP/gte-Qwen2-1.5B-instruct`. The retrieval runtime applies the model's query-side instruction format and last-token pooling, matching the Hugging Face usage guidance. If you want to point retrieval at a different embedding model, set `IAB_RETRIEVAL_MODEL_NAME_OVERRIDE` before building the index.
263
 
264
+ Open-source users can swap in their own embedding model, but the contract is:
265
+
266
+ - query embeddings and taxonomy-node embeddings must be produced by the same model and model revision
267
+ - after changing models, you must rebuild `artifacts/iab/taxonomy_embeddings.pt`
268
+ - the repository only tests and supports the default model path out of the box
269
+ - not every Hugging Face embedding model is drop-in compatible with this runtime; some require custom pooling, query instructions, or `trust_remote_code`
270
+
271
+ Example override:
272
+
273
+ ```bash
274
+ cd agentic-intent-classifier
275
+ export IAB_RETRIEVAL_MODEL_NAME_OVERRIDE=mixedbread-ai/mxbai-embed-large-v1
276
+ python3 training/build_iab_taxonomy_embeddings.py
277
  ```
278
 
279
+ This writes:
280
 
281
+ - `artifacts/iab/taxonomy_nodes.json`
282
+ - `artifacts/iab/taxonomy_embeddings.pt`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
283
 
284
+ ## Training
285
+
286
+ ### Full local pipeline
287
+
288
+ ```bash
289
+ cd agentic-intent-classifier
290
+ python3 training/run_full_training_pipeline.py
291
  ```
292
 
293
+ This pipeline now does:
294
+
295
+ 1. build separate full-intent-taxonomy augmentation data
296
+ 2. build separate `intent_type` difficulty augmentation + benchmark
297
+ 3. train `intent_type`
298
+ 4. build subtype corpus
299
+ 5. build separate `intent_subtype` difficulty augmentation + benchmark
300
+ 6. train `intent_subtype`
301
+ 7. build separate `decision_phase` difficulty augmentation + benchmark
302
+ 8. train `decision_phase`
303
+ 9. train `iab_content`
304
+ 10. calibrate all classifier heads, including `iab_content`
305
+ 11. run regression/evaluation unless `--skip-full-eval` is used
306
+
307
+ ### Build datasets individually
308
+
309
+ Separate full-intent augmentation:
310
+
311
+ ```bash
312
+ cd agentic-intent-classifier
313
+ python3 training/build_full_intent_taxonomy_dataset.py
314
+ ```
315
+
316
+ Intent-type difficulty augmentation and benchmark:
317
+
318
+ ```bash
319
+ cd agentic-intent-classifier
320
+ python3 training/build_intent_type_difficulty_dataset.py
321
+ ```
322
+
323
+ Decision-phase difficulty augmentation and benchmark:
324
+
325
+ ```bash
326
+ cd agentic-intent-classifier
327
+ python3 training/build_decision_phase_difficulty_dataset.py
328
+ ```
329
+
330
+ Subtype difficulty augmentation and benchmark:
331
+
332
+ ```bash
333
+ cd agentic-intent-classifier
334
+ python3 training/build_subtype_difficulty_dataset.py
335
+ ```
336
+
337
+ Subtype dataset:
338
+
339
+ ```bash
340
+ cd agentic-intent-classifier
341
+ python3 training/build_subtype_dataset.py
342
+ ```
343
+
344
+ IAB embedding index:
345
+
346
+ ```bash
347
+ cd agentic-intent-classifier
348
+ python3 training/build_iab_taxonomy_embeddings.py
349
+ ```
350
+
351
+ ### Train heads individually
352
+
353
+ ```bash
354
+ cd agentic-intent-classifier
355
+ python3 training/train.py
356
+ python3 training/train_subtype.py
357
+ python3 training/train_decision_phase.py
358
+ ```
359
+
360
+ ### Calibration
361
+
362
+ ```bash
363
+ cd agentic-intent-classifier
364
+ python3 training/calibrate_confidence.py --head intent_type
365
+ python3 training/calibrate_confidence.py --head intent_subtype
366
+ python3 training/calibrate_confidence.py --head decision_phase
367
+ ```
368
+
369
+ ## Evaluation
370
+
371
+ Full evaluation:
372
+
373
+ ```bash
374
+ cd agentic-intent-classifier
375
+ python3 evaluation/run_evaluation.py
376
+ ```
377
+
378
+ Known-failure regression:
379
+
380
+ ```bash
381
+ cd agentic-intent-classifier
382
+ python3 evaluation/run_regression_suite.py
383
+ ```
384
+
385
+ IAB behavior-lock regression:
386
+
387
+ ```bash
388
+ cd agentic-intent-classifier
389
+ python3 evaluation/run_iab_mapping_suite.py
390
+ ```
391
+
392
+ IAB quality-target evaluation:
393
+
394
+ ```bash
395
+ cd agentic-intent-classifier
396
+ python3 evaluation/run_iab_quality_suite.py
397
+ ```
398
+
399
+ Threshold sweeps:
400
+
401
+ ```bash
402
+ cd agentic-intent-classifier
403
+ python3 evaluation/sweep_intent_threshold.py
404
+ ```
405
+
406
+ Artifacts are written to:
407
+
408
+ - `artifacts/calibration/`
409
+ - `artifacts/evaluation/latest/`
410
+
411
+ ## Google Colab
412
+
413
+ Use Colab for the full retraining pass if local memory is limited.
414
+
415
+ Clone once:
416
+
417
+ ```bash
418
+ %cd /content
419
+ !git clone https://github.com/GouniManikumar12/agentic-intent-classifier.git
420
+ %cd /content/agentic-intent-classifier
421
+ ```
422
+
423
+ If the repo is already cloned and you want the latest code, pull manually:
424
+
425
+ ```bash
426
+ !git pull origin main
427
+ ```
428
+
429
+ Full pipeline:
430
+
431
+ ```bash
432
+ !python training/run_full_training_pipeline.py
433
+ ```
434
+
435
+ If full evaluation is too heavy for the current Colab runtime:
436
+
437
+ ```bash
438
+ !python training/run_full_training_pipeline.py \
439
+ --iab-embedding-batch-size 32 \
440
+ --skip-full-eval
441
+ ```
442
+
443
+ Then run eval separately after training:
444
+
445
+ ```bash
446
+ !python evaluation/run_regression_suite.py
447
+ !python evaluation/run_iab_mapping_suite.py
448
+ !python evaluation/run_iab_quality_suite.py
449
+ !python evaluation/run_evaluation.py
450
+ ```
451
+
452
+ ## Current Saved Metrics
453
+
454
+ Generate fresh metrics with:
455
+
456
+ ```bash
457
+ cd agentic-intent-classifier
458
+ python3 evaluation/run_evaluation.py
459
+ ```
460
+
461
+ Do not treat any checked-in summary as canonical unless it was regenerated after the current code and artifacts were built. The IAB path is now retrieval-based, so older saved reports from the deleted hierarchy stack are not meaningful.
462
+
463
+ ## Latency Note
464
+
465
+ `combined_inference.py` is a debugging/offline path, not a production latency path.
466
+
467
+ Current production truth:
468
+
469
+ - per-request CLI execution is not a sub-50ms architecture
470
+ - production serving should use a long-lived API process with preloaded models
471
+ - if sub-50ms becomes a hard requirement, the serving path will need:
472
+ - persistent loaded models
473
+ - runtime optimization
474
+ - likely fewer model passes or a shared multi-head model
475
+
476
+ ## Current Status
477
+
478
+ Current repo status:
479
 
480
+ - full 10-class `intent.type` taxonomy is wired
481
+ - subtype and phase heads are present
482
+ - difficulty benchmarks are wired for `intent_type`, `intent_subtype`, and `decision_phase`
483
+ - full-TSV IAB taxonomy retrieval is wired through tier4
484
+ - separate full-intent augmentation dataset is in place
485
+ - evaluation/runtime memory handling is improved for large IAB splits
 
486
 
487
+ The main remaining gap is not basic infrastructure anymore. It is improving real-world robustness, especially for:
488
 
489
+ - `decision_phase`
490
+ - `intent_subtype`
491
+ - confidence quality on borderline commercial queries
492
+ - real-traffic supervision beyond synthetic data
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc61889ce5c6b4817f8a808ee656942f62e5442fe8c0ac91c65f299a695560fe
3
+ size 8366760