Update README.md
Browse files
README.md
CHANGED
|
@@ -1,34 +1,127 @@
|
|
| 1 |
---
|
| 2 |
-
language:
|
| 3 |
-
- fr
|
| 4 |
-
- en
|
| 5 |
-
tags:
|
| 6 |
-
- text-classification
|
| 7 |
-
- yes-no
|
| 8 |
-
- onnx
|
| 9 |
-
- distillation
|
| 10 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
pipeline_tag: text-classification
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
-
# ForSureLLM
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
| 19 |
-
When knowing the question and waiting a yes/no answer, asking a frontier LLM is overkill using too much resources, and latency is high for a limited action.
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
-
|
| 24 |
-
MiniLM-L12-v2 multilingual backbone, fine-tuned + int8 quantized.
|
| 25 |
|
| 26 |
-
-
|
| 27 |
-
-
|
| 28 |
-
-
|
| 29 |
-
- Size: 113 MB
|
| 30 |
|
| 31 |
-
##
|
| 32 |
|
| 33 |
-
|
| 34 |
-
- Chatbot asking user and needed a yes/no answer
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- fr
|
| 5 |
+
- en
|
| 6 |
+
library_name: onnx
|
| 7 |
pipeline_tag: text-classification
|
| 8 |
+
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
|
| 9 |
+
tags:
|
| 10 |
+
- yes-no
|
| 11 |
+
- intent-classification
|
| 12 |
+
- distillation
|
| 13 |
+
- onnx
|
| 14 |
+
- edge
|
| 15 |
+
- quantization
|
| 16 |
+
- sentence-transformers
|
| 17 |
+
metrics:
|
| 18 |
+
- accuracy
|
| 19 |
+
model-index:
|
| 20 |
+
- name: ForSureLLM
|
| 21 |
+
results:
|
| 22 |
+
- task:
|
| 23 |
+
type: text-classification
|
| 24 |
+
name: Yes/No/Unknown classification
|
| 25 |
+
metrics:
|
| 26 |
+
- type: accuracy
|
| 27 |
+
value: 0.952
|
| 28 |
+
name: Adversarial accuracy (124 cases)
|
| 29 |
+
- type: accuracy
|
| 30 |
+
value: 0.917
|
| 31 |
+
name: Test accuracy (1178 cases)
|
| 32 |
---
|
| 33 |
|
| 34 |
+
# ForSureLLM
|
| 35 |
+
|
| 36 |
+
Ultra-rapid `yes` / `no` / `unknown` classifier for short user replies, distilled from Claude Sonnet 4.6 into a multilingual MiniLM-L12. **2 ms on CPU**, **24-113 MB**, no API call needed.
|
| 37 |
+
|
| 38 |
+
- 🎯 **Try it live**: [HuggingFace Space demo](https://huggingface.co/spaces/jcfossati/ForSureLLM)
|
| 39 |
+
- 📦 **Source code**: [github.com/jcfossati/ForSureLLM](https://github.com/jcfossati/ForSureLLM)
|
| 40 |
+
|
| 41 |
+
## What it does
|
| 42 |
+
|
| 43 |
+
Given a short French or English reply (1-30 words typically), returns whether the user is **agreeing**, **refusing**, or **hesitating** about a pending action. Designed as a consent-intent oracle for chatbots, IVR systems, CLI confirmations, and automation flows.
|
| 44 |
+
|
| 45 |
+
```python
|
| 46 |
+
from forsurellm import classify
|
| 47 |
+
|
| 48 |
+
classify("carrément") # ("yes", 0.97)
|
| 49 |
+
classify("laisse tomber") # ("no", 0.98)
|
| 50 |
+
classify("je sais pas trop") # ("unknown", 0.96)
|
| 51 |
+
classify("oui mais non") # ("unknown", 0.92)
|
| 52 |
+
classify("yeah right") # ("no", 0.87) # sarcasm detected
|
| 53 |
+
classify("+1") # ("yes", 1.00) # symbolic preprocessor
|
| 54 |
+
classify("👍") # ("yes", 1.00)
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
## Numbers
|
| 58 |
+
|
| 59 |
+
| Metric | Value |
|
| 60 |
+
|---|---|
|
| 61 |
+
| Adversarial accuracy (124 trap phrases, 22 categories) | **95.2 %** |
|
| 62 |
+
| Surface-variant robustness (1227 variants) | 95.8 % |
|
| 63 |
+
| Test set accuracy (1178 phrases) | 91.7 % |
|
| 64 |
+
| Calibration ECE | 0.012 |
|
| 65 |
+
| **CPU latency p50** | **1.8 ms** |
|
| 66 |
+
| ONNX int8 size | 113 MB (multilingual) · **24 MB (FR+EN pruned variant)** |
|
| 67 |
+
|
| 68 |
+
**Head-to-head on the 124-case adversarial bench**:
|
| 69 |
+
|
| 70 |
+
| Classifier | Accuracy | p50 latency | API cost |
|
| 71 |
+
|---|---|---|---|
|
| 72 |
+
| **ForSureLLM** | **95.2 %** | **1.8 ms** | 0 |
|
| 73 |
+
| Haiku 4.5 zero-shot | 75.0 % | 602 ms | $$ |
|
| 74 |
+
| Cosine MiniLM-L12 (no fine-tune) | 67.7 % | 8 ms | 0 |
|
| 75 |
+
|
| 76 |
+
ForSureLLM beats Haiku 4.5 zero-shot by **+20.2 pts** while running **~330× faster**.
|
| 77 |
+
|
| 78 |
+
## Strengths
|
| 79 |
+
|
| 80 |
+
Categories where ForSureLLM crushes a generalist LLM (Haiku 4.5):
|
| 81 |
+
|
| 82 |
+
- `modern_slang` (Gen-Z): `no cap`, `bet`, `say less`, `deadass` — 100 % vs 43 %
|
| 83 |
+
- `negated_verb`: `I wouldn't say no`, `ce n'est pas un non` — 83 % vs 17 %
|
| 84 |
+
- `sarcasm`: `oui bien sûr...`, `yeah right` — 100 % vs 40 %
|
| 85 |
+
- `symbolic`: `+1`, `100%`, `👍`, `10/10` — 100 % vs 40 % (deterministic preprocessor)
|
| 86 |
+
- `slang_abbrev`: `np`, `tkt`, `kk`, `nope` — 100 % vs 50 %
|
| 87 |
+
|
| 88 |
+
## Files in this repo
|
| 89 |
+
|
| 90 |
+
- `forsurellm-int8.onnx` — full multilingual model, 113 MB (50+ languages supported via shared subwords, FR+EN tuned)
|
| 91 |
+
- (Optional) `forsurellm-int8_fr-en.onnx` — vocab-pruned FR+EN variant, 24 MB. Same predictions as the full model on FR+EN inputs, 5× lighter on disk and in RAM (+85 MB process memory vs +418 MB), latency unchanged. Tokens outside FR+EN become `<unk>`.
|
| 92 |
+
|
| 93 |
+
## How to use (without the package)
|
| 94 |
+
|
| 95 |
+
```python
|
| 96 |
+
import onnxruntime as ort
|
| 97 |
+
from huggingface_hub import hf_hub_download
|
| 98 |
+
from tokenizers import Tokenizer
|
| 99 |
+
import numpy as np
|
| 100 |
+
|
| 101 |
+
onnx_path = hf_hub_download("jcfossati/ForSureLLM", "forsurellm-int8.onnx")
|
| 102 |
+
session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
|
| 103 |
+
# tokenizer.json must be downloaded from the GitHub repo (space/tokenizer.json)
|
| 104 |
+
# or installed via the forsurellm package once published.
|
| 105 |
+
```
|
| 106 |
|
| 107 |
+
For the full preprocessing pipeline (case normalisation, symbolic shortcuts, sarcasm-aware threshold), use the `forsurellm` Python package — see the [GitHub repo](https://github.com/jcfossati/ForSureLLM) for installation.
|
| 108 |
|
| 109 |
+
## Training procedure
|
|
|
|
| 110 |
|
| 111 |
+
- **Backbone**: `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` (12 layers, 384 hidden)
|
| 112 |
+
- **Teacher**: Claude Sonnet 4.6 (generation) + Claude Haiku 4.5 (labeling, with Sonnet fallback when confidence < 0.6)
|
| 113 |
+
- **Loss**: KL-divergence on soft labels (3 classes)
|
| 114 |
+
- **Dataset**: ~5,800 hand-curated + LLM-generated EN+FR phrases, balanced across 22 adversarial categories
|
| 115 |
+
- **Training**: 8 epochs, batch 32, lr 2e-5, warmup 10%, weight decay 0.01 (~2 min on RTX Blackwell)
|
| 116 |
+
- **Calibration**: temperature scaling (T = 0.680, fitted by LBFGS on val set NLL)
|
| 117 |
+
- **Export**: ONNX dynamic quantization (avx512-vnni, int8)
|
| 118 |
|
| 119 |
+
## Limitations
|
|
|
|
| 120 |
|
| 121 |
+
- **EN + FR only**. The full model (113 MB) keeps the multilingual vocab and may produce reasonable cross-lingual outputs on related Latin-script languages (Spanish/Italian/German), but is not trained for them. The pruned variant (24 MB) drops non-FR/EN tokens entirely.
|
| 122 |
+
- **Short replies**. Optimized for 1-30 word answers. Long passages will be truncated at 64 tokens.
|
| 123 |
+
- **Sarcasm detection has cultural priors**. `yeah right` defaults to "no" because it's overwhelmingly sarcastic in modern English usage — a sincere user without punctuation might get the wrong call. Use `threshold=0.85` for action-confirmation contexts to fall back to `unknown` on borderline cases.
|
|
|
|
| 124 |
|
| 125 |
+
## License
|
| 126 |
|
| 127 |
+
Apache 2.0 — same as the base MiniLM model.
|
|
|