hslee1981 commited on
Commit
24e2c47
·
verified ·
1 Parent(s): 328d83d

T18 Phase 1 Tier 1: model card

Browse files
Files changed (1) hide show
  1. README.md +37 -12
README.md CHANGED
@@ -130,11 +130,11 @@ LogReg wins on simplicity-tiebreak.
130
 
131
  ### Primary use case
132
 
133
- Upstream gate in the CCR report pipeline. After the XGBoost page classifier flags pages as CCR, this model evaluates whether the parent document is actually a Declaration of Covenants worth running CCR extraction on. Decision band:
134
 
135
- - **Score < 0.30**: confident NOT-CCR. Skip CCR pipeline entirely. Removes the document from CCR dispatch.
136
- - **Score >= 0.85**: confident IS-CCR. Trust the classifier, fast-path bypasses the more expensive agentic `detect_ccr` validator.
137
- - **0.30 <= Score < 0.85**: ambiguous. Escalate to agentic `detect_ccr` for a deeper look.
138
 
139
  ### Out-of-scope use
140
 
@@ -146,9 +146,9 @@ Upstream gate in the CCR report pipeline. After the XGBoost page classifier flag
146
 
147
  ### Calibration
148
 
149
- ECE 0.21-0.28 on validation/test means the model's predicted probabilities are systematically over-confident. The decision-band thresholds (0.30 / 0.85) above are **operationally tuned, not probability-calibrated**.
150
 
151
- If you redeploy with new thresholds based on different operating goals, derive them empirically from the test set's reliability diagram. Wrap with isotonic recalibration if probability calibration matters for your use case.
152
 
153
  ### Sample size
154
 
@@ -191,19 +191,42 @@ doc_vector = np.mean(page_vectors, axis=0).reshape(1, -1)
191
  # Predict
192
  score = model.predict_proba(doc_vector)[0, 1]
193
 
194
- # Three-band decision
195
- if score < 0.30:
196
  decision = "REJECT" # confident not a Declaration; skip CCR pipeline
197
- elif score >= 0.85:
198
  decision = "FAST_PASS" # confident Declaration; bypass agentic validator
199
  else:
200
  decision = "ESCALATE" # ambiguous; run agentic detect_ccr
201
  ```
202
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
203
  ### Files in this repo
204
 
205
- - `ccr_binary_logreg_tuned.joblib` — pickled dict containing `model` (sklearn LogisticRegression), `threshold` (float, 0.436), and `config` (dict with `embedding_model`, `max_pages_per_doc`, `skip_boilerplate` flags)
206
- - `config.json` — JSON-readable summary of the model configuration
 
207
 
208
  ## Training Procedure
209
 
@@ -254,4 +277,6 @@ Model artifacts are versioned via HuggingFace commit history. `config.json` incl
254
 
255
  ## Maintenance
256
 
257
- This model is part of the T18 plan (CCR Upstream Input Hardening) in the GoverningDocs platform. See `plans/T18_CCR_UPSTREAM_INPUT_HARDENING_PLAN.md` (v2.1.1) in the product repo for design rationale, alternatives considered (page-classifier retrain, agentic-only, signature patterns), and Phase 2 wire-in plans.
 
 
 
130
 
131
  ### Primary use case
132
 
133
+ Upstream gate in the CCR report pipeline. After the XGBoost page classifier flags pages as CCR, this model evaluates whether the parent document is actually a Declaration of Covenants worth running CCR extraction on. Decision band (recalibrated empirically — the original `(0.30, 0.85)` plan-time bands left FAST_PASS empty in production because real Declarations score 0.45-0.70 raw):
134
 
135
+ - **Score < 0.25**: confident NOT-CCR. Skip CCR pipeline entirely. Removes the document from CCR dispatch.
136
+ - **Score >= 0.55**: confident IS-CCR. Trust the classifier, fast-path bypasses the more expensive agentic `detect_ccr` validator.
137
+ - **0.25 <= Score < 0.55**: ambiguous. Escalate to agentic `detect_ccr` for a deeper look.
138
 
139
  ### Out-of-scope use
140
 
 
146
 
147
  ### Calibration
148
 
149
+ The raw LogReg artifact has ECE 0.19-0.28 on validation/test predicted probabilities are systematically miscalibrated. The decision-band thresholds `(0.25, 0.55)` above are **empirically tuned on the production score distribution, not probability-calibrated**.
150
 
151
+ A separate isotonic calibrator artifact (`ccr_binary_isotonic_calibrator.joblib`) ships in the same repo and reduces test-set ECE from 0.278 to 0.087 (3.2x improvement). It is **purely additive metadata** — the production gate still consumes raw scores. Use the calibrator if you need probability-calibrated outputs for drift monitoring, signal combination with other classifiers, or user-facing confidence display. See the "Calibration Support" section below for details.
152
 
153
  ### Sample size
154
 
 
191
  # Predict
192
  score = model.predict_proba(doc_vector)[0, 1]
193
 
194
+ # Three-band decision (recalibrated production bands)
195
+ if score < 0.25:
196
  decision = "REJECT" # confident not a Declaration; skip CCR pipeline
197
+ elif score >= 0.55:
198
  decision = "FAST_PASS" # confident Declaration; bypass agentic validator
199
  else:
200
  decision = "ESCALATE" # ambiguous; run agentic detect_ccr
201
  ```
202
 
203
+ ### Calibration Support
204
+
205
+ Optional isotonic calibrator (`ccr_binary_isotonic_calibrator.joblib`) maps raw scores to probability-calibrated outputs.
206
+
207
+ ```python
208
+ calibrator_path = hf_hub_download(
209
+ repo_id="GoverningDocs/ccr-binary-logreg",
210
+ filename="ccr_binary_isotonic_calibrator.joblib",
211
+ )
212
+ cal_artifact = joblib.load(calibrator_path)
213
+ calibrator = cal_artifact["calibrator"]
214
+
215
+ # Apply isotonic to a raw score (cv="prefit" + method="isotonic" + binary
216
+ # fits on raw predict_proba outputs, so we can apply directly to a float)
217
+ inner = calibrator.calibrated_classifiers_[0].calibrators[0]
218
+ calibrated = float(inner.predict([score])[0])
219
+ ```
220
+
221
+ **Caveats:**
222
+ - The shipped isotonic was fit on a small (~70-doc) validation split and produces approximately 3 plateau outputs (0.737, 0.833, 1.000). Treat calibrated scores as 3-level (low / med / high) confidence rather than fine-grained probabilities.
223
+ - The calibrator's `shipped_model_filename` field MUST match the model file you loaded. Cross-check before use to guard against artifact mismatch.
224
+
225
  ### Files in this repo
226
 
227
+ - `ccr_binary_logreg_tuned.joblib` — pickled dict containing `model` (sklearn LogisticRegression) and `config` (dict with `embedding_model`, `max_pages_per_doc`, `skip_boilerplate` flags). The `threshold` field (0.436) is a Phase 1 artifact; production uses bands, not a single threshold.
228
+ - `ccr_binary_isotonic_calibrator.joblib` — pickled dict containing `calibrator` (sklearn `CalibratedClassifierCV` with `cv="prefit"`, `method="isotonic"`), `shipped_model_filename` (paired model artifact), and ECE before/after metadata.
229
+ - `config.json` — JSON-readable summary of the model configuration, decision bands, and calibrator metadata.
230
 
231
  ## Training Procedure
232
 
 
277
 
278
  ## Maintenance
279
 
280
+ This model is part of the T18 plan (CCR Upstream Input Hardening) in the GoverningDocs platform. See `plans/T18_CCR_UPSTREAM_INPUT_HARDENING_PLAN.md` (v2.2.1, Completed) in the product repo for design rationale, alternatives considered (page-classifier retrain, agentic-only, signature patterns), and Phase 2 wire-in.
281
+
282
+ Calibrator artifact added per `plans/CCR_BINARY_ISOTONIC_RECALIBRATION_PLAN.md` (v1.4.0). Phase 1 findings: `experiments/setfit_ccr_binary/ISOTONIC_CALIBRATION_FINDINGS.md`.