Infon multilingual linguistic classifiers

Six single-sentence classifiers running on a shared MiniLM-L12 backbone. Replaces the regex-based detectors in infon/extract.py with a learned multilingual model.

Head Classes
polarity affirmed · negated · uncertain
tense past · present · future · conditional
conditional yes · no
relation_type causal · temporal · spatial · attributive · none
spatial containment · proximity · direction · movement · none
direction increase · decrease · stable · target · none

Quick start (JavaScript)

npm install @cp500/infon-heads onnxruntime-web
import { InfonHeadsModel } from '@cp500/infon-heads';

const model = await InfonHeadsModel.fromHub('cp500/infon-heads', {
  precision: 'fp16',     // 224 MB (default) — vs 448 MB for fp32
  device: 'auto',
});

const r = await model.classify(
  'Toyota did not raise battery output last quarter because demand fell.'
);

console.log(r.polarity);             // 'negated'
console.log(r.tense);                // 'past'
console.log(r.relationType);         // 'causal'
console.log(r.comparativeDirection); // 'decrease'

The JS client source is mirrored under js/ for self-contained installs.

Quick start (Python)

import torch
from transformers import AutoModel, AutoTokenizer
from infon.heads import InfonHeads

backbone = AutoModel.from_pretrained("./backbone/")
tokenizer = AutoTokenizer.from_pretrained("./backbone/")
heads = InfonHeads.load(".")  # loads heads.pt

Architecture

text ─▶ tokenize ─▶ heads_backbone.onnx (MiniLM-L12, 117M, 224 MB FP16)
                          │
                          ▼ cls (B, H=384)
              heads_classifiers.onnx (6 tiny MLPs, 144 KB FP16)
                          │
                          ▼
   6 logit tensors → argmax + softmax → labels + confidence

Two ONNX graphs:

  • onnx/heads_backbone.onnxparaphrase-multilingual-MiniLM-L12-v2 with the CLS token surfaced as the only output. One forward per document.
  • onnx/heads_classifiers.onnx — six small MLPs sharing CLS input. Each emits its own logit tensor named <head>_logits. The cost of additional heads is essentially zero, so the bundle stays compact.

Evaluation

Best-by-macro-accuracy checkpoint:

Head Validation acc
polarity 0.947
tense 0.743
conditional 0.949
relation_type 0.477
spatial 0.912
direction 0.914

Macro accuracy across all 6 heads: 0.824.

Per-language

Language macro pol ten con rel spa dir
en 0.864 0.97 0.88 0.98 0.49 0.95 0.91
ja 0.805 0.92 0.74 0.92 0.45 0.88 0.90
ko 0.827 0.93 0.77 0.97 0.47 0.92 0.90
th 0.817 0.96 0.67 0.94 0.53 0.88 0.92
zh 0.809 0.95 0.65 0.93 0.44 0.93 0.94

Known limits

  • relation_type underperforms (~47%). The 5 classes (causal / temporal / spatial / attributive / none) overlap meaningfully — many sentences are simultaneously causal AND spatial-movement, but the synthetic training data forces a single label per sentence. Treat low-confidence relation_type predictions (confidence.relationType < 0.5) as unreliable.
  • Trained on synthetic data in news-article register. Out-of-domain text (chat, code, formal contracts) may underperform.
  • Trained on the 5 listed languages; XLM-R's other languages may work via zero-shot transfer but are not validated.

Training

  • Backbone: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
  • Synthetic corpus: 6,000 multi-task labeled sentences from Bedrock/Claude Haiku 4.5 (cell-balanced over 15,000 label combinations × 5 languages).
  • Joint multi-task CE loss; backbone fine-tuned at low LR (2e-5 → 1e-4 OneCycle), heads at 1e-3 → 5e-3 OneCycle.
  • 4 epochs on a Mac M-series MPS device, batch 32, ~10 min.

License

Apache 2.0 for both weights and JS code.

Downloads last month
250
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support