jcfossati commited on
Commit
3a2d895
·
verified ·
1 Parent(s): bd2bace

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -22
README.md CHANGED
@@ -1,34 +1,127 @@
1
  ---
2
- language:
3
- - fr
4
- - en
5
- tags:
6
- - text-classification
7
- - yes-no
8
- - onnx
9
- - distillation
10
  license: apache-2.0
 
 
 
 
11
  pipeline_tag: text-classification
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
- # ForSureLLM — yes/no/unknown classifier
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- Source : https://github.com/jcfossati/ForSureLLM
17
 
18
- Basic analysis of english and french language for yes/no detection.
19
- When knowing the question and waiting a yes/no answer, asking a frontier LLM is overkill using too much resources, and latency is high for a limited action.
20
 
21
- ## Stats
 
 
 
 
 
 
22
 
23
- Distilled via KL-divergence on soft labels.
24
- MiniLM-L12-v2 multilingual backbone, fine-tuned + int8 quantized.
25
 
26
- - Accuracy: 91.4%
27
- - ECE: 0.007 (calibrated)
28
- - Latency: 2.5ms CPU
29
- - Size: 113 MB
30
 
31
- ## Usage
32
 
33
- - Interaction between an application and the user featuring free-form text input.
34
- - Chatbot asking user and needed a yes/no answer
 
1
  ---
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
+ language:
4
+ - fr
5
+ - en
6
+ library_name: onnx
7
  pipeline_tag: text-classification
8
+ base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
9
+ tags:
10
+ - yes-no
11
+ - intent-classification
12
+ - distillation
13
+ - onnx
14
+ - edge
15
+ - quantization
16
+ - sentence-transformers
17
+ metrics:
18
+ - accuracy
19
+ model-index:
20
+ - name: ForSureLLM
21
+ results:
22
+ - task:
23
+ type: text-classification
24
+ name: Yes/No/Unknown classification
25
+ metrics:
26
+ - type: accuracy
27
+ value: 0.952
28
+ name: Adversarial accuracy (124 cases)
29
+ - type: accuracy
30
+ value: 0.917
31
+ name: Test accuracy (1178 cases)
32
  ---
33
 
34
+ # ForSureLLM
35
+
36
+ Ultra-rapid `yes` / `no` / `unknown` classifier for short user replies, distilled from Claude Sonnet 4.6 into a multilingual MiniLM-L12. **2 ms on CPU**, **24-113 MB**, no API call needed.
37
+
38
+ - 🎯 **Try it live**: [HuggingFace Space demo](https://huggingface.co/spaces/jcfossati/ForSureLLM)
39
+ - 📦 **Source code**: [github.com/jcfossati/ForSureLLM](https://github.com/jcfossati/ForSureLLM)
40
+
41
+ ## What it does
42
+
43
+ Given a short French or English reply (1-30 words typically), returns whether the user is **agreeing**, **refusing**, or **hesitating** about a pending action. Designed as a consent-intent oracle for chatbots, IVR systems, CLI confirmations, and automation flows.
44
+
45
+ ```python
46
+ from forsurellm import classify
47
+
48
+ classify("carrément") # ("yes", 0.97)
49
+ classify("laisse tomber") # ("no", 0.98)
50
+ classify("je sais pas trop") # ("unknown", 0.96)
51
+ classify("oui mais non") # ("unknown", 0.92)
52
+ classify("yeah right") # ("no", 0.87) # sarcasm detected
53
+ classify("+1") # ("yes", 1.00) # symbolic preprocessor
54
+ classify("👍") # ("yes", 1.00)
55
+ ```
56
+
57
+ ## Numbers
58
+
59
+ | Metric | Value |
60
+ |---|---|
61
+ | Adversarial accuracy (124 trap phrases, 22 categories) | **95.2 %** |
62
+ | Surface-variant robustness (1227 variants) | 95.8 % |
63
+ | Test set accuracy (1178 phrases) | 91.7 % |
64
+ | Calibration ECE | 0.012 |
65
+ | **CPU latency p50** | **1.8 ms** |
66
+ | ONNX int8 size | 113 MB (multilingual) · **24 MB (FR+EN pruned variant)** |
67
+
68
+ **Head-to-head on the 124-case adversarial bench**:
69
+
70
+ | Classifier | Accuracy | p50 latency | API cost |
71
+ |---|---|---|---|
72
+ | **ForSureLLM** | **95.2 %** | **1.8 ms** | 0 |
73
+ | Haiku 4.5 zero-shot | 75.0 % | 602 ms | $$ |
74
+ | Cosine MiniLM-L12 (no fine-tune) | 67.7 % | 8 ms | 0 |
75
+
76
+ ForSureLLM beats Haiku 4.5 zero-shot by **+20.2 pts** while running **~330× faster**.
77
+
78
+ ## Strengths
79
+
80
+ Categories where ForSureLLM crushes a generalist LLM (Haiku 4.5):
81
+
82
+ - `modern_slang` (Gen-Z): `no cap`, `bet`, `say less`, `deadass` — 100 % vs 43 %
83
+ - `negated_verb`: `I wouldn't say no`, `ce n'est pas un non` — 83 % vs 17 %
84
+ - `sarcasm`: `oui bien sûr...`, `yeah right` — 100 % vs 40 %
85
+ - `symbolic`: `+1`, `100%`, `👍`, `10/10` — 100 % vs 40 % (deterministic preprocessor)
86
+ - `slang_abbrev`: `np`, `tkt`, `kk`, `nope` — 100 % vs 50 %
87
+
88
+ ## Files in this repo
89
+
90
+ - `forsurellm-int8.onnx` — full multilingual model, 113 MB (50+ languages supported via shared subwords, FR+EN tuned)
91
+ - (Optional) `forsurellm-int8_fr-en.onnx` — vocab-pruned FR+EN variant, 24 MB. Same predictions as the full model on FR+EN inputs, 5× lighter on disk and in RAM (+85 MB process memory vs +418 MB), latency unchanged. Tokens outside FR+EN become `<unk>`.
92
+
93
+ ## How to use (without the package)
94
+
95
+ ```python
96
+ import onnxruntime as ort
97
+ from huggingface_hub import hf_hub_download
98
+ from tokenizers import Tokenizer
99
+ import numpy as np
100
+
101
+ onnx_path = hf_hub_download("jcfossati/ForSureLLM", "forsurellm-int8.onnx")
102
+ session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
103
+ # tokenizer.json must be downloaded from the GitHub repo (space/tokenizer.json)
104
+ # or installed via the forsurellm package once published.
105
+ ```
106
 
107
+ For the full preprocessing pipeline (case normalisation, symbolic shortcuts, sarcasm-aware threshold), use the `forsurellm` Python package — see the [GitHub repo](https://github.com/jcfossati/ForSureLLM) for installation.
108
 
109
+ ## Training procedure
 
110
 
111
+ - **Backbone**: `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` (12 layers, 384 hidden)
112
+ - **Teacher**: Claude Sonnet 4.6 (generation) + Claude Haiku 4.5 (labeling, with Sonnet fallback when confidence < 0.6)
113
+ - **Loss**: KL-divergence on soft labels (3 classes)
114
+ - **Dataset**: ~5,800 hand-curated + LLM-generated EN+FR phrases, balanced across 22 adversarial categories
115
+ - **Training**: 8 epochs, batch 32, lr 2e-5, warmup 10%, weight decay 0.01 (~2 min on RTX Blackwell)
116
+ - **Calibration**: temperature scaling (T = 0.680, fitted by LBFGS on val set NLL)
117
+ - **Export**: ONNX dynamic quantization (avx512-vnni, int8)
118
 
119
+ ## Limitations
 
120
 
121
+ - **EN + FR only**. The full model (113 MB) keeps the multilingual vocab and may produce reasonable cross-lingual outputs on related Latin-script languages (Spanish/Italian/German), but is not trained for them. The pruned variant (24 MB) drops non-FR/EN tokens entirely.
122
+ - **Short replies**. Optimized for 1-30 word answers. Long passages will be truncated at 64 tokens.
123
+ - **Sarcasm detection has cultural priors**. `yeah right` defaults to "no" because it's overwhelmingly sarcastic in modern English usage — a sincere user without punctuation might get the wrong call. Use `threshold=0.85` for action-confirmation contexts to fall back to `unknown` on borderline cases.
 
124
 
125
+ ## License
126
 
127
+ Apache 2.0 same as the base MiniLM model.