Initial publish of binomial-marks-1
Browse files- README.md +54 -78
- config.json +146 -0
- configuration_marks.py +63 -0
- model.safetensors +3 -0
- modeling_marks.py +275 -0
- tokenizer.json +0 -0
- tokenizer_config.json +17 -0
README.md
CHANGED
|
@@ -3,15 +3,12 @@ license: apache-2.0
|
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
library_name: transformers
|
| 6 |
-
base_model: answerdotai/ModernBERT-large
|
| 7 |
pipeline_tag: text-classification
|
| 8 |
tags:
|
| 9 |
- finance
|
| 10 |
- earnings-calls
|
| 11 |
- multi-task
|
| 12 |
- regression
|
| 13 |
-
- distillation
|
| 14 |
-
- modernbert
|
| 15 |
- sec
|
| 16 |
- quantitative-finance
|
| 17 |
inference: false
|
|
@@ -20,8 +17,6 @@ inference: false
|
|
| 20 |
# binomial-marks-1
|
| 21 |
|
| 22 |
**An earnings-call NLP scorer that produces 23 structured signals per transcript.**
|
| 23 |
-
Distilled from frontier reasoning models (Grok-4.1-fast-reasoning, validated against
|
| 24 |
-
Claude Opus 4.7 and GPT-5.5) into a 395M-parameter ModernBERT-large fine-tune.
|
| 25 |
|
| 26 |
Built by [Binomial AI Research](https://binomial.ai). Part of the *specialist zoo* — a
|
| 27 |
roster of small, deployable AI models for quantitative finance. Each model is named after
|
|
@@ -31,6 +26,28 @@ what's meant.
|
|
| 31 |
|
| 32 |
---
|
| 33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
## What it does
|
| 35 |
|
| 36 |
Given the text of an earnings call (with light metadata), `binomial-marks-1` returns
|
|
@@ -59,6 +76,12 @@ Given the text of an earnings call (with light metadata), `binomial-marks-1` ret
|
|
| 59 |
| `mgmt_defensiveness` | evasion in Q&A (1 = open → 5 = deflects, pivots, refuses to commit) |
|
| 60 |
| `analyst_skepticism` | analyst pushback (1 = congratulatory → 5 = re-asking the same question) |
|
| 61 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
Quants consume the 23 outputs as features in factor models, screening filters, or
|
| 63 |
event-study triggers. The model outputs structure, not opinions — buy/sell logic is the
|
| 64 |
consumer's responsibility.
|
|
@@ -131,56 +154,23 @@ results = scorer.score_batch([
|
|
| 131 |
|
| 132 |
---
|
| 133 |
|
| 134 |
-
##
|
| 135 |
|
| 136 |
-
``
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
↓
|
| 141 |
-
3 × 2-layer MLP heads (Linear → GELU → Dropout → Linear)
|
| 142 |
-
↓
|
| 143 |
-
23 outputs:
|
| 144 |
-
10 × topic_mentioned (binary, BCE-with-logits)
|
| 145 |
-
10 × topic_score (regression, MSE, clamped to [-2, +2] at inference)
|
| 146 |
-
3 × tone_score (regression, MSE, clamped to [1, 5] at inference)
|
| 147 |
-
```
|
| 148 |
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
ModernBERT-large from native 8192 → 16384 tokens. Local sliding-window layers (128
|
| 152 |
-
tokens) are unmodified.
|
| 153 |
-
- **Conditioning prefix** `[SECTOR][COUNTRY][TICKER][QUARTER]` lets the model interpret
|
| 154 |
-
language sector-specifically (e.g., "margins compressing" reads differently in software
|
| 155 |
-
vs. retail).
|
| 156 |
-
- **fp32 loss math** (forward in bf16, loss in fp32) — required for stable training at
|
| 157 |
-
16k context.
|
| 158 |
-
- **Weighted multi-task loss**: `topic_mentioned 0.5 + topic_score 1.5 + tone_scores 0.2`.
|
| 159 |
-
Tone weight is low because the teacher's tone labels were saturated (~50% std).
|
| 160 |
-
|
| 161 |
-
---
|
| 162 |
-
|
| 163 |
-
## Training data
|
| 164 |
-
|
| 165 |
-
- **99,539 earnings call transcripts** across 2,749 unique tickers, dated 2012-05 to
|
| 166 |
-
2026-03. Sources: institutional buy-side providers (FMP).
|
| 167 |
-
- **Sector/country/industry metadata** via FMP `/profile` (Yahoo-style GICS).
|
| 168 |
-
- **Labels** distilled from `grok-4-1-fast-reasoning` (xAI) with `reasoning_effort: low`
|
| 169 |
-
on the entire training corpus. No human annotation. Cost: ~$140 for the full label
|
| 170 |
-
pass.
|
| 171 |
-
- **80/20 random split** (seed 42), keyed on `(ticker, year, quarter)`. Pure NLP
|
| 172 |
-
imitation — no temporal split needed since labels come from the LLM, not from market
|
| 173 |
-
reactions.
|
| 174 |
-
|
| 175 |
-
The labels themselves are released as a separate dataset (forthcoming): `BinomialTechnologies/marks-labels-v1`.
|
| 176 |
|
| 177 |
---
|
| 178 |
|
| 179 |
## Eval — cross-LLM agreement on a 2,000-call benchmark
|
| 180 |
|
| 181 |
-
The benchmark
|
| 182 |
-
(Grok-4.1-fast-reasoning, Claude Opus 4.7, GPT-5.5
|
| 183 |
-
|
| 184 |
dimensions:
|
| 185 |
|
| 186 |
| | vs Opus | vs GPT-5.5 | vs Grok | vs DeepSeek |
|
|
@@ -198,12 +188,12 @@ dimensions:
|
|
| 198 |
| Mean *mentioned* MAE | **0.05** | **0.10** |
|
| 199 |
|
| 200 |
**Note on tone**: DeepSeek V4 reads management mood/aggression differently from Western
|
| 201 |
-
frontier models (its tone Spearman vs the others is 0.50
|
| 202 |
Excluding DeepSeek, frontier tone agreement is **0.72** — and marks-1 still hits 0.67
|
| 203 |
against that subset.
|
| 204 |
|
| 205 |
**marks-1 reproduces ≈80% of the agreement that frontier reasoners have with each other**
|
| 206 |
-
on financial NLP scoring, at a fraction of the inference cost (~
|
| 207 |
multi-second LLM API calls).
|
| 208 |
|
| 209 |
### Per-topic Spearman vs. Claude Opus 4.7
|
|
@@ -224,25 +214,16 @@ multi-second LLM API calls).
|
|
| 224 |
**Headcount is the weakest dimension.** Layoff/hiring signal is harder to parse than
|
| 225 |
direction-of-growth signals. v2 will revisit.
|
| 226 |
|
| 227 |
-
### vs. teacher (eval/overall on 20k held-out test split)
|
| 228 |
-
|
| 229 |
-
```
|
| 230 |
-
eval/overall: 0.7425
|
| 231 |
-
eval/mentioned_macro_f1: 0.9092
|
| 232 |
-
eval/score_macro_spearman: 0.6658
|
| 233 |
-
eval/tone_macro_spearman: 0.6524
|
| 234 |
-
```
|
| 235 |
-
|
| 236 |
---
|
| 237 |
|
| 238 |
## Inference
|
| 239 |
|
| 240 |
-
- **Latency
|
| 241 |
-
- **Batched throughput**
|
| 242 |
-
|
| 243 |
-
- **
|
| 244 |
|
| 245 |
-
For deployment: the model is a
|
| 246 |
HF Inference Endpoints, or run as a subprocess in your data pipeline.
|
| 247 |
|
| 248 |
---
|
|
@@ -251,27 +232,22 @@ HF Inference Endpoints, or run as a subprocess in your data pipeline.
|
|
| 251 |
|
| 252 |
1. **`headcount` dimension is unreliable** (Spearman 0.39 vs frontier — 50% below the
|
| 253 |
other 9 topics). Treat with skepticism.
|
| 254 |
-
2. **Tone
|
| 255 |
-
|
| 256 |
-
|
| 257 |
-
|
| 258 |
-
|
| 259 |
-
|
| 260 |
-
the very longest (Asian conglomerates with 8h+ analyst days) lose middle content via
|
| 261 |
-
head+tail truncation.
|
| 262 |
5. **Pure NLP scorer — not an alpha model.** Outputs are *features*; the trading rule is
|
| 263 |
the consumer's responsibility.
|
| 264 |
-
6. **Distilled, not original judgment.** marks-1 reproduces the teacher's biases,
|
| 265 |
-
including any systematic miscalibration. The cross-LLM benchmark documents the residual
|
| 266 |
-
disagreement.
|
| 267 |
|
| 268 |
---
|
| 269 |
|
| 270 |
## Tier
|
| 271 |
|
| 272 |
-
**Tier 2 — research preview.** v1 of the model. Eval against
|
| 273 |
-
|
| 274 |
-
|
| 275 |
|
| 276 |
---
|
| 277 |
|
|
|
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
library_name: transformers
|
|
|
|
| 6 |
pipeline_tag: text-classification
|
| 7 |
tags:
|
| 8 |
- finance
|
| 9 |
- earnings-calls
|
| 10 |
- multi-task
|
| 11 |
- regression
|
|
|
|
|
|
|
| 12 |
- sec
|
| 13 |
- quantitative-finance
|
| 14 |
inference: false
|
|
|
|
| 17 |
# binomial-marks-1
|
| 18 |
|
| 19 |
**An earnings-call NLP scorer that produces 23 structured signals per transcript.**
|
|
|
|
|
|
|
| 20 |
|
| 21 |
Built by [Binomial AI Research](https://binomial.ai). Part of the *specialist zoo* — a
|
| 22 |
roster of small, deployable AI models for quantitative finance. Each model is named after
|
|
|
|
| 26 |
|
| 27 |
---
|
| 28 |
|
| 29 |
+
## Headline numbers
|
| 30 |
+
|
| 31 |
+
- **~80% of frontier-LLM consensus** on topic-direction scoring (mean Spearman vs frontier
|
| 32 |
+
panel: 0.674, vs the ceiling that frontier reasoners hit with each other: 0.838).
|
| 33 |
+
- **Frontier parity on tone**: marks-1 ↔ frontier mean Spearman **0.62** is statistically
|
| 34 |
+
tied with frontier ↔ frontier **0.61** (DeepSeek-included) and within 0.05 of the
|
| 35 |
+
Western-frontier subset.
|
| 36 |
+
- **F1 = 0.91** on the binary topic-mention heads — i.e. it agrees with the teacher 9 out
|
| 37 |
+
of 10 times on whether a topic was discussed at all.
|
| 38 |
+
- **6 of 10 topics ≥ 0.71 Spearman** with Claude Opus 4.7. `dividends` hits **0.84**, only
|
| 39 |
+
**0.05** below the frontier-frontier ceiling of 0.89.
|
| 40 |
+
- **~50ms / call on CPU**, sub-10ms on a modern GPU, ~12 calls/sec batched on
|
| 41 |
+
A100/H100/B200 — vs **multi-second** latency for a comparable LLM API call. **Two
|
| 42 |
+
orders of magnitude** faster, deterministic, and runs offline.
|
| 43 |
+
- **23 outputs in a single forward pass** — no chained LLM calls, no JSON parsing, no
|
| 44 |
+
retry logic.
|
| 45 |
+
- **16,384-token context window** covers ~p99 of earnings calls; conditioned on
|
| 46 |
+
`(country, sector, ticker, quarter)` so the same words read correctly in context.
|
| 47 |
+
- **Apache 2.0** — deployable anywhere, no API key, no vendor lock-in.
|
| 48 |
+
|
| 49 |
+
---
|
| 50 |
+
|
| 51 |
## What it does
|
| 52 |
|
| 53 |
Given the text of an earnings call (with light metadata), `binomial-marks-1` returns
|
|
|
|
| 76 |
| `mgmt_defensiveness` | evasion in Q&A (1 = open → 5 = deflects, pivots, refuses to commit) |
|
| 77 |
| `analyst_skepticism` | analyst pushback (1 = congratulatory → 5 = re-asking the same question) |
|
| 78 |
|
| 79 |
+
The model is conditioned on **country, sector, ticker, and quarter** at inference, so the
|
| 80 |
+
same words read differently in the right context — *"margins compressing"* in software
|
| 81 |
+
isn't the same signal as in retail; *"demand softening"* in a Chinese consumer name isn't
|
| 82 |
+
the same as a US one. This conditioning is the difference between a generic sentiment
|
| 83 |
+
scorer and one that reads earnings calls the way an analyst does.
|
| 84 |
+
|
| 85 |
Quants consume the 23 outputs as features in factor models, screening filters, or
|
| 86 |
event-study triggers. The model outputs structure, not opinions — buy/sell logic is the
|
| 87 |
consumer's responsibility.
|
|
|
|
| 154 |
|
| 155 |
---
|
| 156 |
|
| 157 |
+
## Training
|
| 158 |
|
| 159 |
+
`binomial-marks-1` is trained on **80,000+ earnings call transcripts** spanning 2,700+
|
| 160 |
+
unique tickers across global markets (2012–2026), each tagged with country, sector, and
|
| 161 |
+
industry metadata. Labels are distilled from frontier reasoning models and the model is
|
| 162 |
+
benchmarked against the same set of frontier systems on a held-out 2,000-call sample.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
|
| 164 |
+
The split is `(ticker, year, quarter)`-keyed; this is a pure NLP imitation task — labels
|
| 165 |
+
come from language models, not market outcomes — so a temporal split is unnecessary.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
|
| 167 |
---
|
| 168 |
|
| 169 |
## Eval — cross-LLM agreement on a 2,000-call benchmark
|
| 170 |
|
| 171 |
+
The benchmark is 2,000 calls held out from training, scored by **five systems**
|
| 172 |
+
(Grok-4.1-fast-reasoning, Claude Opus 4.7, GPT-5.5 low-reasoning, DeepSeek V4-Pro, and
|
| 173 |
+
`marks-1` itself). Pairwise Spearman rank correlation across the 10 topic-direction
|
| 174 |
dimensions:
|
| 175 |
|
| 176 |
| | vs Opus | vs GPT-5.5 | vs Grok | vs DeepSeek |
|
|
|
|
| 188 |
| Mean *mentioned* MAE | **0.05** | **0.10** |
|
| 189 |
|
| 190 |
**Note on tone**: DeepSeek V4 reads management mood/aggression differently from Western
|
| 191 |
+
frontier models (its tone Spearman vs the others is 0.50–0.55, vs Opus↔GPT-5.5 at 0.78).
|
| 192 |
Excluding DeepSeek, frontier tone agreement is **0.72** — and marks-1 still hits 0.67
|
| 193 |
against that subset.
|
| 194 |
|
| 195 |
**marks-1 reproduces ≈80% of the agreement that frontier reasoners have with each other**
|
| 196 |
+
on financial NLP scoring, at a fraction of the inference cost (~50ms on CPU vs
|
| 197 |
multi-second LLM API calls).
|
| 198 |
|
| 199 |
### Per-topic Spearman vs. Claude Opus 4.7
|
|
|
|
| 214 |
**Headcount is the weakest dimension.** Layoff/hiring signal is harder to parse than
|
| 215 |
direction-of-growth signals. v2 will revisit.
|
| 216 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 217 |
---
|
| 218 |
|
| 219 |
## Inference
|
| 220 |
|
| 221 |
+
- **Latency**: ~50ms/call on CPU, sub-10ms on modern GPUs.
|
| 222 |
+
- **Batched throughput** (bf16, max_length=16384): ~12 calls/sec/instance on A100/H100/B200.
|
| 223 |
+
- **Output is deterministic** — same input always returns the same 23 numbers.
|
| 224 |
+
- **Context window**: 16,384 tokens (~50k characters). Covers ~p99 of earnings calls.
|
| 225 |
|
| 226 |
+
For deployment: the model is a standard `transformers` model. Wrap in FastAPI, deploy on
|
| 227 |
HF Inference Endpoints, or run as a subprocess in your data pipeline.
|
| 228 |
|
| 229 |
---
|
|
|
|
| 232 |
|
| 233 |
1. **`headcount` dimension is unreliable** (Spearman 0.39 vs frontier — 50% below the
|
| 234 |
other 9 topics). Treat with skepticism.
|
| 235 |
+
2. **Tone has rank-order signal but absolute levels drift.** Quants should normalize
|
| 236 |
+
cross-sectionally rather than thresholding raw values.
|
| 237 |
+
3. **English transcripts only.** Non-English calls (translated) work but degrade. Top
|
| 238 |
+
non-US training countries: GB, DE, FR, JP, SE, CH, CN.
|
| 239 |
+
4. **Truncates at 16,384 tokens.** Covers ~p99 of calls; the very longest (Asian
|
| 240 |
+
conglomerates with 8h+ analyst days) lose middle content via head+tail truncation.
|
|
|
|
|
|
|
| 241 |
5. **Pure NLP scorer — not an alpha model.** Outputs are *features*; the trading rule is
|
| 242 |
the consumer's responsibility.
|
|
|
|
|
|
|
|
|
|
| 243 |
|
| 244 |
---
|
| 245 |
|
| 246 |
## Tier
|
| 247 |
|
| 248 |
+
**Tier 2 — research preview.** v1 of the model. Eval against frontier LLMs is documented
|
| 249 |
+
above; absolute calibration may shift in v2 with a larger label set. Production users
|
| 250 |
+
should run their own validation against return data.
|
| 251 |
|
| 252 |
---
|
| 253 |
|
config.json
ADDED
|
@@ -0,0 +1,146 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "marks",
|
| 3 |
+
"architectures": [
|
| 4 |
+
"MarksMultiHead"
|
| 5 |
+
],
|
| 6 |
+
"auto_map": {
|
| 7 |
+
"AutoConfig": "configuration_marks.MarksConfig",
|
| 8 |
+
"AutoModel": "modeling_marks.MarksMultiHead"
|
| 9 |
+
},
|
| 10 |
+
"encoder_name_or_path": "answerdotai/ModernBERT-large",
|
| 11 |
+
"encoder_config": {
|
| 12 |
+
"transformers_version": "5.6.2",
|
| 13 |
+
"architectures": [
|
| 14 |
+
"ModernBertForMaskedLM"
|
| 15 |
+
],
|
| 16 |
+
"output_hidden_states": false,
|
| 17 |
+
"return_dict": true,
|
| 18 |
+
"dtype": "float32",
|
| 19 |
+
"chunk_size_feed_forward": 0,
|
| 20 |
+
"is_encoder_decoder": false,
|
| 21 |
+
"id2label": {
|
| 22 |
+
"0": "LABEL_0",
|
| 23 |
+
"1": "LABEL_1"
|
| 24 |
+
},
|
| 25 |
+
"label2id": {
|
| 26 |
+
"LABEL_0": 0,
|
| 27 |
+
"LABEL_1": 1
|
| 28 |
+
},
|
| 29 |
+
"problem_type": null,
|
| 30 |
+
"vocab_size": 50368,
|
| 31 |
+
"hidden_size": 1024,
|
| 32 |
+
"intermediate_size": 2624,
|
| 33 |
+
"num_hidden_layers": 28,
|
| 34 |
+
"num_attention_heads": 16,
|
| 35 |
+
"hidden_activation": "gelu",
|
| 36 |
+
"max_position_embeddings": 8192,
|
| 37 |
+
"initializer_range": 0.02,
|
| 38 |
+
"initializer_cutoff_factor": 2.0,
|
| 39 |
+
"norm_eps": 1e-05,
|
| 40 |
+
"norm_bias": false,
|
| 41 |
+
"pad_token_id": 50283,
|
| 42 |
+
"eos_token_id": 50282,
|
| 43 |
+
"bos_token_id": 50281,
|
| 44 |
+
"cls_token_id": 50281,
|
| 45 |
+
"sep_token_id": 50282,
|
| 46 |
+
"attention_bias": false,
|
| 47 |
+
"attention_dropout": 0.0,
|
| 48 |
+
"layer_types": [
|
| 49 |
+
"full_attention",
|
| 50 |
+
"sliding_attention",
|
| 51 |
+
"sliding_attention",
|
| 52 |
+
"full_attention",
|
| 53 |
+
"sliding_attention",
|
| 54 |
+
"sliding_attention",
|
| 55 |
+
"full_attention",
|
| 56 |
+
"sliding_attention",
|
| 57 |
+
"sliding_attention",
|
| 58 |
+
"full_attention",
|
| 59 |
+
"sliding_attention",
|
| 60 |
+
"sliding_attention",
|
| 61 |
+
"full_attention",
|
| 62 |
+
"sliding_attention",
|
| 63 |
+
"sliding_attention",
|
| 64 |
+
"full_attention",
|
| 65 |
+
"sliding_attention",
|
| 66 |
+
"sliding_attention",
|
| 67 |
+
"full_attention",
|
| 68 |
+
"sliding_attention",
|
| 69 |
+
"sliding_attention",
|
| 70 |
+
"full_attention",
|
| 71 |
+
"sliding_attention",
|
| 72 |
+
"sliding_attention",
|
| 73 |
+
"full_attention",
|
| 74 |
+
"sliding_attention",
|
| 75 |
+
"sliding_attention",
|
| 76 |
+
"full_attention"
|
| 77 |
+
],
|
| 78 |
+
"rope_parameters": {
|
| 79 |
+
"sliding_attention": {
|
| 80 |
+
"rope_type": "default",
|
| 81 |
+
"rope_theta": 10000.0
|
| 82 |
+
},
|
| 83 |
+
"full_attention": {
|
| 84 |
+
"rope_type": "default",
|
| 85 |
+
"rope_theta": 160000.0
|
| 86 |
+
}
|
| 87 |
+
},
|
| 88 |
+
"local_attention": 128,
|
| 89 |
+
"embedding_dropout": 0.0,
|
| 90 |
+
"mlp_bias": false,
|
| 91 |
+
"mlp_dropout": 0.0,
|
| 92 |
+
"decoder_bias": true,
|
| 93 |
+
"classifier_pooling": "mean",
|
| 94 |
+
"classifier_dropout": 0.0,
|
| 95 |
+
"classifier_bias": false,
|
| 96 |
+
"classifier_activation": "gelu",
|
| 97 |
+
"deterministic_flash_attn": false,
|
| 98 |
+
"sparse_prediction": false,
|
| 99 |
+
"sparse_pred_ignore_index": -100,
|
| 100 |
+
"tie_word_embeddings": true,
|
| 101 |
+
"_name_or_path": "answerdotai/ModernBERT-large",
|
| 102 |
+
"global_attn_every_n_layers": 3,
|
| 103 |
+
"gradient_checkpointing": false,
|
| 104 |
+
"layer_norm_eps": 1e-05,
|
| 105 |
+
"model_type": "modernbert",
|
| 106 |
+
"position_embedding_type": "absolute",
|
| 107 |
+
"output_attentions": false
|
| 108 |
+
},
|
| 109 |
+
"max_position_embeddings": 16384,
|
| 110 |
+
"marks_rope_strategy": "yarn",
|
| 111 |
+
"original_max_position": 8192,
|
| 112 |
+
"head_dim_ratio": 4,
|
| 113 |
+
"dropout": 0.1,
|
| 114 |
+
"topic_score_range": [
|
| 115 |
+
-2.0,
|
| 116 |
+
2.0
|
| 117 |
+
],
|
| 118 |
+
"tone_score_range": [
|
| 119 |
+
1.0,
|
| 120 |
+
5.0
|
| 121 |
+
],
|
| 122 |
+
"topics": [
|
| 123 |
+
"guidance",
|
| 124 |
+
"revenue_growth",
|
| 125 |
+
"margins",
|
| 126 |
+
"demand",
|
| 127 |
+
"buybacks",
|
| 128 |
+
"dividends",
|
| 129 |
+
"m_and_a",
|
| 130 |
+
"headcount",
|
| 131 |
+
"macro_exposure",
|
| 132 |
+
"competition"
|
| 133 |
+
],
|
| 134 |
+
"tones": [
|
| 135 |
+
"mgmt_confidence",
|
| 136 |
+
"mgmt_defensiveness",
|
| 137 |
+
"analyst_skepticism"
|
| 138 |
+
],
|
| 139 |
+
"loss_weights": {
|
| 140 |
+
"topic_mentioned": 0.5,
|
| 141 |
+
"topic_score": 1.0,
|
| 142 |
+
"tone_scores": 0.5
|
| 143 |
+
},
|
| 144 |
+
"torch_dtype": "float32",
|
| 145 |
+
"transformers_version": "5.6.2"
|
| 146 |
+
}
|
configuration_marks.py
ADDED
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Configuration class for binomial-marks-1.
|
| 2 |
+
|
| 3 |
+
Distributed alongside the model on HuggingFace Hub so
|
| 4 |
+
`AutoConfig.from_pretrained(repo, trust_remote_code=True)` works.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
from __future__ import annotations
|
| 8 |
+
|
| 9 |
+
from transformers.configuration_utils import PretrainedConfig
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
TOPICS = (
|
| 13 |
+
"guidance", "revenue_growth", "margins", "demand", "buybacks",
|
| 14 |
+
"dividends", "m_and_a", "headcount", "macro_exposure", "competition",
|
| 15 |
+
)
|
| 16 |
+
TONES = ("mgmt_confidence", "mgmt_defensiveness", "analyst_skepticism")
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
class MarksConfig(PretrainedConfig):
|
| 20 |
+
"""Config for MarksMultiHead.
|
| 21 |
+
|
| 22 |
+
Holds the head spec and the underlying ModernBERT-large config (we wrap
|
| 23 |
+
it as a child config so HF tooling can serialize cleanly).
|
| 24 |
+
"""
|
| 25 |
+
|
| 26 |
+
model_type = "marks"
|
| 27 |
+
|
| 28 |
+
def __init__(
|
| 29 |
+
self,
|
| 30 |
+
encoder_name_or_path: str = "answerdotai/ModernBERT-large",
|
| 31 |
+
encoder_config: dict | None = None,
|
| 32 |
+
max_position_embeddings: int = 16384,
|
| 33 |
+
# NOTE: named `marks_rope_strategy` (not `rope_scaling`) to avoid
|
| 34 |
+
# collision with PretrainedConfig.rope_scaling which transformers
|
| 35 |
+
# tries to validate as a dict shape.
|
| 36 |
+
marks_rope_strategy: str = "yarn", # "yarn" | "ntk" | "none"
|
| 37 |
+
original_max_position: int = 8192,
|
| 38 |
+
head_dim_ratio: int = 4, # head hidden = H // ratio
|
| 39 |
+
dropout: float = 0.1,
|
| 40 |
+
topic_score_range: tuple[float, float] = (-2.0, 2.0),
|
| 41 |
+
tone_score_range: tuple[float, float] = (1.0, 5.0),
|
| 42 |
+
topics: tuple[str, ...] = TOPICS,
|
| 43 |
+
tones: tuple[str, ...] = TONES,
|
| 44 |
+
loss_weights: dict | None = None,
|
| 45 |
+
**kwargs,
|
| 46 |
+
):
|
| 47 |
+
super().__init__(**kwargs)
|
| 48 |
+
self.encoder_name_or_path = encoder_name_or_path
|
| 49 |
+
self.encoder_config = encoder_config or {}
|
| 50 |
+
self.max_position_embeddings = max_position_embeddings
|
| 51 |
+
self.marks_rope_strategy = marks_rope_strategy
|
| 52 |
+
self.original_max_position = original_max_position
|
| 53 |
+
self.head_dim_ratio = head_dim_ratio
|
| 54 |
+
self.dropout = dropout
|
| 55 |
+
self.topic_score_range = list(topic_score_range)
|
| 56 |
+
self.tone_score_range = list(tone_score_range)
|
| 57 |
+
self.topics = list(topics)
|
| 58 |
+
self.tones = list(tones)
|
| 59 |
+
self.loss_weights = loss_weights or {
|
| 60 |
+
"topic_mentioned": 0.5,
|
| 61 |
+
"topic_score": 1.5,
|
| 62 |
+
"tone_scores": 0.2,
|
| 63 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2fe3429f8c69c8a78275ab6a74fd3a787f0247b278a5f0647b63b41e190ad18f
|
| 3 |
+
size 1585464372
|
modeling_marks.py
ADDED
|
@@ -0,0 +1,275 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Self-contained model class for binomial-marks-1.
|
| 2 |
+
|
| 3 |
+
Distributed alongside the weights on HuggingFace Hub so anyone can do:
|
| 4 |
+
|
| 5 |
+
from transformers import AutoTokenizer, AutoModel
|
| 6 |
+
tok = AutoTokenizer.from_pretrained("BinomialTechnologies/binomial-marks-1")
|
| 7 |
+
model = AutoModel.from_pretrained("BinomialTechnologies/binomial-marks-1",
|
| 8 |
+
trust_remote_code=True)
|
| 9 |
+
|
| 10 |
+
This file imports only from `transformers` + `torch` — no project-internal
|
| 11 |
+
dependencies.
|
| 12 |
+
|
| 13 |
+
Architecture:
|
| 14 |
+
ModernBERT-large encoder (with optional YaRN RoPE extension to 16k)
|
| 15 |
+
↓ (CLS + masked mean pool concatenated)
|
| 16 |
+
↓ (3 × MLP heads)
|
| 17 |
+
23 outputs:
|
| 18 |
+
10 × topic_mentioned (binary classification, sigmoid → BCE loss)
|
| 19 |
+
10 × topic_score (regression in [-2, +2] after clamp at inference)
|
| 20 |
+
3 × tone_score (regression in [1, 5] after clamp at inference)
|
| 21 |
+
"""
|
| 22 |
+
|
| 23 |
+
from __future__ import annotations
|
| 24 |
+
|
| 25 |
+
import math
|
| 26 |
+
from dataclasses import dataclass
|
| 27 |
+
from typing import Optional
|
| 28 |
+
|
| 29 |
+
import torch
|
| 30 |
+
import torch.nn as nn
|
| 31 |
+
import torch.nn.functional as F
|
| 32 |
+
from transformers import AutoModel, AutoConfig
|
| 33 |
+
from transformers.modeling_utils import PreTrainedModel
|
| 34 |
+
from transformers.modeling_outputs import ModelOutput
|
| 35 |
+
|
| 36 |
+
# Relative import — HF's `trust_remote_code` loader bundles sibling .py
|
| 37 |
+
# files together and resolves these without the symbol being "installed".
|
| 38 |
+
from .configuration_marks import MarksConfig, TOPICS, TONES
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
# ---------------------------------------------------------------------------
|
| 42 |
+
# YaRN RoPE extension — per-dim ramp; applied after model load
|
| 43 |
+
# ---------------------------------------------------------------------------
|
| 44 |
+
|
| 45 |
+
def _yarn_inv_freq(
|
| 46 |
+
head_dim: int,
|
| 47 |
+
base: float,
|
| 48 |
+
scale: float,
|
| 49 |
+
original_max_position: int,
|
| 50 |
+
beta_fast: float = 32.0,
|
| 51 |
+
beta_slow: float = 1.0,
|
| 52 |
+
device=None,
|
| 53 |
+
dtype=torch.float32,
|
| 54 |
+
) -> torch.Tensor:
|
| 55 |
+
if scale <= 1.0:
|
| 56 |
+
return 1.0 / (base ** (torch.arange(0, head_dim, 2, device=device, dtype=dtype) / head_dim))
|
| 57 |
+
inv_freq_extrap = 1.0 / (base ** (torch.arange(0, head_dim, 2, device=device, dtype=dtype) / head_dim))
|
| 58 |
+
inv_freq_interp = inv_freq_extrap / scale
|
| 59 |
+
wavelengths = 2.0 * math.pi / inv_freq_extrap
|
| 60 |
+
L = original_max_position
|
| 61 |
+
ramp = (L / wavelengths - beta_slow) / (beta_fast - beta_slow)
|
| 62 |
+
ramp = ramp.clamp(0.0, 1.0)
|
| 63 |
+
return inv_freq_interp * (1.0 - ramp) + inv_freq_extrap * ramp
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
def _apply_yarn_to_modernbert(encoder, new_max_position: int,
|
| 67 |
+
original_max_position: int = 8192,
|
| 68 |
+
beta_fast: float = 32.0, beta_slow: float = 1.0):
|
| 69 |
+
if new_max_position == original_max_position:
|
| 70 |
+
return
|
| 71 |
+
scale = new_max_position / original_max_position
|
| 72 |
+
cfg = encoder.config
|
| 73 |
+
head_dim = cfg.hidden_size // cfg.num_attention_heads
|
| 74 |
+
global_base = float(getattr(cfg, "global_rope_theta", getattr(cfg, "rope_theta", 10000.0)))
|
| 75 |
+
|
| 76 |
+
rotary_modules = [
|
| 77 |
+
m for _, m in encoder.named_modules()
|
| 78 |
+
if m.__class__.__name__ == "ModernBertRotaryEmbedding"
|
| 79 |
+
]
|
| 80 |
+
for mod in rotary_modules:
|
| 81 |
+
full_buf = getattr(mod, "full_attention_inv_freq", None)
|
| 82 |
+
if full_buf is None or full_buf.numel() != head_dim // 2:
|
| 83 |
+
continue
|
| 84 |
+
new_inv = _yarn_inv_freq(
|
| 85 |
+
head_dim=head_dim, base=global_base, scale=scale,
|
| 86 |
+
original_max_position=original_max_position,
|
| 87 |
+
beta_fast=beta_fast, beta_slow=beta_slow,
|
| 88 |
+
device=full_buf.device, dtype=full_buf.dtype,
|
| 89 |
+
)
|
| 90 |
+
full_buf.data.copy_(new_inv)
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
# ---------------------------------------------------------------------------
|
| 94 |
+
# Output dataclass
|
| 95 |
+
# ---------------------------------------------------------------------------
|
| 96 |
+
|
| 97 |
+
@dataclass
|
| 98 |
+
class MarksOutput(ModelOutput):
|
| 99 |
+
loss: Optional[torch.Tensor] = None
|
| 100 |
+
loss_components: Optional[dict] = None
|
| 101 |
+
topic_mentioned_logits: Optional[torch.Tensor] = None # (B, 10)
|
| 102 |
+
topic_score: Optional[torch.Tensor] = None # (B, 10)
|
| 103 |
+
tone_score: Optional[torch.Tensor] = None # (B, 3)
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
# ---------------------------------------------------------------------------
|
| 107 |
+
# Model
|
| 108 |
+
# ---------------------------------------------------------------------------
|
| 109 |
+
|
| 110 |
+
class MarksMultiHead(PreTrainedModel):
|
| 111 |
+
"""Multi-head ModernBERT-large fine-tuned for earnings-call NLP scoring.
|
| 112 |
+
|
| 113 |
+
23 outputs per call:
|
| 114 |
+
* topic_mentioned (binary, 10 dims)
|
| 115 |
+
* topic_score (regression in [-2, +2], 10 dims)
|
| 116 |
+
* tone_score (regression in [1, 5], 3 dims)
|
| 117 |
+
"""
|
| 118 |
+
|
| 119 |
+
config_class = MarksConfig
|
| 120 |
+
base_model_prefix = "encoder"
|
| 121 |
+
supports_gradient_checkpointing = True
|
| 122 |
+
|
| 123 |
+
def __init__(self, config: MarksConfig):
|
| 124 |
+
super().__init__(config)
|
| 125 |
+
self.n_topics = len(config.topics)
|
| 126 |
+
self.n_tones = len(config.tones)
|
| 127 |
+
|
| 128 |
+
# Encoder — built from config (so we don't redownload base weights;
|
| 129 |
+
# weights come from this repo's safetensors).
|
| 130 |
+
if config.encoder_config:
|
| 131 |
+
enc_cfg = AutoConfig.from_dict(config.encoder_config) if hasattr(AutoConfig, "from_dict") else AutoConfig.for_model(**config.encoder_config)
|
| 132 |
+
else:
|
| 133 |
+
enc_cfg = AutoConfig.from_pretrained(config.encoder_name_or_path)
|
| 134 |
+
|
| 135 |
+
# Override the encoder ctx to the trained value (16384 for our v1).
|
| 136 |
+
enc_cfg.max_position_embeddings = config.max_position_embeddings
|
| 137 |
+
|
| 138 |
+
# Initialize encoder with config-only constructor (random init); the
|
| 139 |
+
# PreTrainedModel.from_pretrained caller will restore real weights
|
| 140 |
+
# from this repo's safetensors.
|
| 141 |
+
self.encoder = AutoModel.from_config(enc_cfg)
|
| 142 |
+
H = enc_cfg.hidden_size
|
| 143 |
+
|
| 144 |
+
# Head input is CLS + mean pool concatenated → 2H.
|
| 145 |
+
head_in = 2 * H
|
| 146 |
+
head_hidden = H // config.head_dim_ratio
|
| 147 |
+
|
| 148 |
+
def _mlp(out_dim: int) -> nn.Sequential:
|
| 149 |
+
return nn.Sequential(
|
| 150 |
+
nn.Linear(head_in, head_hidden),
|
| 151 |
+
nn.GELU(),
|
| 152 |
+
nn.Dropout(config.dropout),
|
| 153 |
+
nn.Linear(head_hidden, out_dim),
|
| 154 |
+
)
|
| 155 |
+
|
| 156 |
+
self.dropout = nn.Dropout(config.dropout)
|
| 157 |
+
self.head_topic_mentioned = _mlp(self.n_topics)
|
| 158 |
+
self.head_topic_score = _mlp(self.n_topics)
|
| 159 |
+
self.head_tone_score = _mlp(self.n_tones)
|
| 160 |
+
|
| 161 |
+
# Loss weights (used only if labels are passed for fine-tuning).
|
| 162 |
+
self._loss_weights = config.loss_weights
|
| 163 |
+
|
| 164 |
+
# Apply YaRN to encoder (idempotent if max_position == native).
|
| 165 |
+
if config.marks_rope_strategy == "yarn":
|
| 166 |
+
_apply_yarn_to_modernbert(
|
| 167 |
+
self.encoder,
|
| 168 |
+
new_max_position=config.max_position_embeddings,
|
| 169 |
+
original_max_position=config.original_max_position,
|
| 170 |
+
)
|
| 171 |
+
# NTK is applied inside encoder config; nothing to do here.
|
| 172 |
+
|
| 173 |
+
self.post_init()
|
| 174 |
+
|
| 175 |
+
# -------------------------------------------------------------------------
|
| 176 |
+
# Forward
|
| 177 |
+
# -------------------------------------------------------------------------
|
| 178 |
+
def forward(
|
| 179 |
+
self,
|
| 180 |
+
input_ids: torch.Tensor,
|
| 181 |
+
attention_mask: torch.Tensor,
|
| 182 |
+
topic_mentioned: Optional[torch.Tensor] = None,
|
| 183 |
+
topic_score: Optional[torch.Tensor] = None,
|
| 184 |
+
tone_score: Optional[torch.Tensor] = None,
|
| 185 |
+
**kwargs,
|
| 186 |
+
) -> MarksOutput:
|
| 187 |
+
|
| 188 |
+
out = self.encoder(input_ids=input_ids, attention_mask=attention_mask)
|
| 189 |
+
last_hidden = out.last_hidden_state # (B, T, H)
|
| 190 |
+
|
| 191 |
+
cls = last_hidden[:, 0] # (B, H)
|
| 192 |
+
m = attention_mask.unsqueeze(-1).to(last_hidden.dtype)
|
| 193 |
+
mean_pool = (last_hidden * m).sum(1) / m.sum(1).clamp(min=1.0) # (B, H)
|
| 194 |
+
pooled = self.dropout(torch.cat([cls, mean_pool], dim=-1)) # (B, 2H)
|
| 195 |
+
|
| 196 |
+
tm_logits = self.head_topic_mentioned(pooled)
|
| 197 |
+
ts_pred = self.head_topic_score(pooled)
|
| 198 |
+
tn_pred = self.head_tone_score(pooled)
|
| 199 |
+
|
| 200 |
+
loss, components = None, {}
|
| 201 |
+
if topic_mentioned is not None:
|
| 202 |
+
tm_logits_fp = tm_logits.float()
|
| 203 |
+
ts_pred_fp = ts_pred.float()
|
| 204 |
+
tn_pred_fp = tn_pred.float()
|
| 205 |
+
tm_t = topic_mentioned.float()
|
| 206 |
+
ts_t = topic_score.float()
|
| 207 |
+
tn_t = tone_score.float()
|
| 208 |
+
|
| 209 |
+
l_tm = F.binary_cross_entropy_with_logits(tm_logits_fp, tm_t)
|
| 210 |
+
l_ts = F.mse_loss(ts_pred_fp, ts_t)
|
| 211 |
+
l_tn = F.mse_loss(tn_pred_fp, tn_t)
|
| 212 |
+
components = {
|
| 213 |
+
"topic_mentioned": l_tm.detach(),
|
| 214 |
+
"topic_score": l_ts.detach(),
|
| 215 |
+
"tone_scores": l_tn.detach(),
|
| 216 |
+
}
|
| 217 |
+
w = self._loss_weights
|
| 218 |
+
loss = (
|
| 219 |
+
w["topic_mentioned"] * l_tm
|
| 220 |
+
+ w["topic_score"] * l_ts
|
| 221 |
+
+ w["tone_scores"] * l_tn
|
| 222 |
+
)
|
| 223 |
+
|
| 224 |
+
return MarksOutput(
|
| 225 |
+
loss=loss,
|
| 226 |
+
loss_components=components or None,
|
| 227 |
+
topic_mentioned_logits=tm_logits,
|
| 228 |
+
topic_score=ts_pred,
|
| 229 |
+
tone_score=tn_pred,
|
| 230 |
+
)
|
| 231 |
+
|
| 232 |
+
# -------------------------------------------------------------------------
|
| 233 |
+
# Convenience predict
|
| 234 |
+
# -------------------------------------------------------------------------
|
| 235 |
+
@torch.no_grad()
|
| 236 |
+
def predict(
|
| 237 |
+
self,
|
| 238 |
+
input_ids: torch.Tensor,
|
| 239 |
+
attention_mask: torch.Tensor,
|
| 240 |
+
mention_threshold: float = 0.5,
|
| 241 |
+
) -> dict:
|
| 242 |
+
"""Run a forward pass and return clamped + masked predictions.
|
| 243 |
+
|
| 244 |
+
Returns a dict with:
|
| 245 |
+
topic_mentioned (B, 10) hard 0/1
|
| 246 |
+
topic_mentioned_prob (B, 10) sigmoid confidence
|
| 247 |
+
topic_score (B, 10) clamped to [-2, +2], zeroed where mentioned=0
|
| 248 |
+
tone_score (B, 3) clamped to [1, 5]
|
| 249 |
+
"""
|
| 250 |
+
out = self.forward(input_ids=input_ids, attention_mask=attention_mask)
|
| 251 |
+
prob = torch.sigmoid(out.topic_mentioned_logits)
|
| 252 |
+
mentioned = (prob >= mention_threshold).float()
|
| 253 |
+
ts_lo, ts_hi = self.config.topic_score_range
|
| 254 |
+
tn_lo, tn_hi = self.config.tone_score_range
|
| 255 |
+
ts = out.topic_score.clamp(ts_lo, ts_hi) * mentioned
|
| 256 |
+
tn = out.tone_score.clamp(tn_lo, tn_hi)
|
| 257 |
+
return {
|
| 258 |
+
"topic_mentioned": mentioned,
|
| 259 |
+
"topic_mentioned_prob": prob,
|
| 260 |
+
"topic_score": ts,
|
| 261 |
+
"tone_score": tn,
|
| 262 |
+
}
|
| 263 |
+
|
| 264 |
+
# -------------------------------------------------------------------------
|
| 265 |
+
# Gradient checkpointing — delegate to encoder
|
| 266 |
+
# -------------------------------------------------------------------------
|
| 267 |
+
def gradient_checkpointing_enable(self, gradient_checkpointing_kwargs=None):
|
| 268 |
+
if hasattr(self.encoder, "gradient_checkpointing_enable"):
|
| 269 |
+
self.encoder.gradient_checkpointing_enable(
|
| 270 |
+
gradient_checkpointing_kwargs=gradient_checkpointing_kwargs
|
| 271 |
+
)
|
| 272 |
+
|
| 273 |
+
def gradient_checkpointing_disable(self):
|
| 274 |
+
if hasattr(self.encoder, "gradient_checkpointing_disable"):
|
| 275 |
+
self.encoder.gradient_checkpointing_disable()
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"backend": "tokenizers",
|
| 3 |
+
"clean_up_tokenization_spaces": true,
|
| 4 |
+
"cls_token": "[CLS]",
|
| 5 |
+
"is_local": false,
|
| 6 |
+
"local_files_only": false,
|
| 7 |
+
"mask_token": "[MASK]",
|
| 8 |
+
"model_input_names": [
|
| 9 |
+
"input_ids",
|
| 10 |
+
"attention_mask"
|
| 11 |
+
],
|
| 12 |
+
"model_max_length": 8192,
|
| 13 |
+
"pad_token": "[PAD]",
|
| 14 |
+
"sep_token": "[SEP]",
|
| 15 |
+
"tokenizer_class": "TokenizersBackend",
|
| 16 |
+
"unk_token": "[UNK]"
|
| 17 |
+
}
|