GoverningDocs
/

ccr-binary-logreg

@@ -130,11 +130,11 @@ LogReg wins on simplicity-tiebreak.
 ### Primary use case
-Upstream gate in the CCR report pipeline. After the XGBoost page classifier flags pages as CCR, this model evaluates whether the parent document is actually a Declaration of Covenants worth running CCR extraction on. Decision band:
-- **Score < 0.30**: confident NOT-CCR. Skip CCR pipeline entirely. Removes the document from CCR dispatch.
-- **Score >= 0.85**: confident IS-CCR. Trust the classifier, fast-path bypasses the more expensive agentic `detect_ccr` validator.
-- **0.30 <= Score < 0.85**: ambiguous. Escalate to agentic `detect_ccr` for a deeper look.
 ### Out-of-scope use
@@ -146,9 +146,9 @@ Upstream gate in the CCR report pipeline. After the XGBoost page classifier flag
 ### Calibration
-ECE 0.21-0.28 on validation/test means the model's predicted probabilities are systematically over-confident. The decision-band thresholds (0.30 / 0.85) above are **operationally tuned, not probability-calibrated**.
-If you redeploy with new thresholds based on different operating goals, derive them empirically from the test set's reliability diagram. Wrap with isotonic recalibration if probability calibration matters for your use case.
 ### Sample size
@@ -191,19 +191,42 @@ doc_vector = np.mean(page_vectors, axis=0).reshape(1, -1)
 # Predict
 score = model.predict_proba(doc_vector)[0, 1]
-# Three-band decision
-if score < 0.30:
     decision = "REJECT"  # confident not a Declaration; skip CCR pipeline
-elif score >= 0.85:
     decision = "FAST_PASS"  # confident Declaration; bypass agentic validator
 else:
     decision = "ESCALATE"  # ambiguous; run agentic detect_ccr
 ```
 ### Files in this repo
-- `ccr_binary_logreg_tuned.joblib` — pickled dict containing `model` (sklearn LogisticRegression), `threshold` (float, 0.436), and `config` (dict with `embedding_model`, `max_pages_per_doc`, `skip_boilerplate` flags)
-- `config.json` — JSON-readable summary of the model configuration
 ## Training Procedure
@@ -254,4 +277,6 @@ Model artifacts are versioned via HuggingFace commit history. `config.json` incl
 ## Maintenance
-This model is part of the T18 plan (CCR Upstream Input Hardening) in the GoverningDocs platform. See `plans/T18_CCR_UPSTREAM_INPUT_HARDENING_PLAN.md` (v2.1.1) in the product repo for design rationale, alternatives considered (page-classifier retrain, agentic-only, signature patterns), and Phase 2 wire-in plans.

 ### Primary use case
+Upstream gate in the CCR report pipeline. After the XGBoost page classifier flags pages as CCR, this model evaluates whether the parent document is actually a Declaration of Covenants worth running CCR extraction on. Decision band (recalibrated empirically — the original `(0.30, 0.85)` plan-time bands left FAST_PASS empty in production because real Declarations score 0.45-0.70 raw):
+- **Score < 0.25**: confident NOT-CCR. Skip CCR pipeline entirely. Removes the document from CCR dispatch.
+- **Score >= 0.55**: confident IS-CCR. Trust the classifier, fast-path bypasses the more expensive agentic `detect_ccr` validator.
+- **0.25 <= Score < 0.55**: ambiguous. Escalate to agentic `detect_ccr` for a deeper look.
 ### Out-of-scope use
 ### Calibration
+The raw LogReg artifact has ECE 0.19-0.28 on validation/test — predicted probabilities are systematically miscalibrated. The decision-band thresholds `(0.25, 0.55)` above are **empirically tuned on the production score distribution, not probability-calibrated**.
+A separate isotonic calibrator artifact (`ccr_binary_isotonic_calibrator.joblib`) ships in the same repo and reduces test-set ECE from 0.278 to 0.087 (3.2x improvement). It is **purely additive metadata** — the production gate still consumes raw scores. Use the calibrator if you need probability-calibrated outputs for drift monitoring, signal combination with other classifiers, or user-facing confidence display. See the "Calibration Support" section below for details.
 ### Sample size
 # Predict
 score = model.predict_proba(doc_vector)[0, 1]
+# Three-band decision (recalibrated production bands)
+if score < 0.25:
     decision = "REJECT"  # confident not a Declaration; skip CCR pipeline
+elif score >= 0.55:
     decision = "FAST_PASS"  # confident Declaration; bypass agentic validator
 else:
     decision = "ESCALATE"  # ambiguous; run agentic detect_ccr
 ```
+### Calibration Support
+Optional isotonic calibrator (`ccr_binary_isotonic_calibrator.joblib`) maps raw scores to probability-calibrated outputs.
+```python
+calibrator_path = hf_hub_download(
+    repo_id="GoverningDocs/ccr-binary-logreg",
+    filename="ccr_binary_isotonic_calibrator.joblib",
+)
+cal_artifact = joblib.load(calibrator_path)
+calibrator = cal_artifact["calibrator"]
+# Apply isotonic to a raw score (cv="prefit" + method="isotonic" + binary
+# fits on raw predict_proba outputs, so we can apply directly to a float)
+inner = calibrator.calibrated_classifiers_[0].calibrators[0]
+calibrated = float(inner.predict([score])[0])
+```
+**Caveats:**
+- The shipped isotonic was fit on a small (~70-doc) validation split and produces approximately 3 plateau outputs (0.737, 0.833, 1.000). Treat calibrated scores as 3-level (low / med / high) confidence rather than fine-grained probabilities.
+- The calibrator's `shipped_model_filename` field MUST match the model file you loaded. Cross-check before use to guard against artifact mismatch.
 ### Files in this repo
+- `ccr_binary_logreg_tuned.joblib` — pickled dict containing `model` (sklearn LogisticRegression) and `config` (dict with `embedding_model`, `max_pages_per_doc`, `skip_boilerplate` flags). The `threshold` field (0.436) is a Phase 1 artifact; production uses bands, not a single threshold.
+- `ccr_binary_isotonic_calibrator.joblib` — pickled dict containing `calibrator` (sklearn `CalibratedClassifierCV` with `cv="prefit"`, `method="isotonic"`), `shipped_model_filename` (paired model artifact), and ECE before/after metadata.
+- `config.json` — JSON-readable summary of the model configuration, decision bands, and calibrator metadata.
 ## Training Procedure
 ## Maintenance
+This model is part of the T18 plan (CCR Upstream Input Hardening) in the GoverningDocs platform. See `plans/T18_CCR_UPSTREAM_INPUT_HARDENING_PLAN.md` (v2.2.1, Completed) in the product repo for design rationale, alternatives considered (page-classifier retrain, agentic-only, signature patterns), and Phase 2 wire-in.
+Calibrator artifact added per `plans/CCR_BINARY_ISOTONIC_RECALIBRATION_PLAN.md` (v1.4.0). Phase 1 findings: `experiments/setfit_ccr_binary/ISOTONIC_CALIBRATION_FINDINGS.md`.