modernbert-embed-base_finetune_8192 — ONNX

Unofficial community mirror. Not affiliated with, maintained by, or endorsed by the Free Law Project, Nomic AI, or Answer.AI. All credit for the model itself goes to the Free Law Project — if you use these embeddings, please credit them.

ONNX conversions of freelawproject/modernbert-embed-base_finetune_8192. The upstream repo hosts safetensors only; this repo exists so the model can be used with onnxruntime / onnxruntime-node for local and offline inference.

Two artifacts are provided:

model.onnx — a format conversion of the upstream weights to ONNX (fp32). No retraining, no fine-tuning, no weight changes.
model_quantized.onnx — a modified, derived artifact: int8 dynamic weight quantization of the fp32 export. Quantization changes the weight values; treat it as a smaller, approximate variant.

Model lineage

Stage	Repo	License
Base encoder	`answerdotai/ModernBERT-base`	Apache-2.0
Embedding model	`nomic-ai/modernbert-embed-base`	Apache-2.0
Legal fine-tune (upstream of this repo)	`freelawproject/modernbert-embed-base_finetune_8192`	CC0-1.0
This repo	ONNX conversion + int8 quantization of the above	see License below

Per the upstream model card, the model was fine-tuned by the Free Law Project on legal opinion documents (their freelawproject/opinions-synthetic-query-8192 dataset). It "maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more." See the upstream card for training details and evaluation — none of that is reproduced or re-measured here, and this repo adds no benchmark numbers of its own.

Key properties (inherited from upstream):

768-dimensional embeddings
8192-token maximum sequence length
Mean pooling + L2 normalization (upstream uses the sentence-transformers layout with a 1_Pooling module — the ONNX graph here is the transformer only, so you must apply pooling and normalization yourself; see the example below)
Cosine similarity as the intended similarity function

Files

File	Precision	Size	Notes
`model.onnx`	fp32	~596 MB	Format conversion only. Validated against the PyTorch reference at export time: max abs diff 2.26e-05 on `last_hidden_state`.
`model_quantized.onnx`	int8 (dynamic, QInt8)	~143 MB	Derived from `model.onnx`. Smaller download, lower memory, CPU-friendly. No accuracy evaluation has been run on this variant — validate on your own retrieval task before relying on it.
`config.json`, `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json`	—	—	Copied unchanged from the optimum export of the upstream fine-tune repo.
`LICENSE`	—	—	CC0-1.0 legal code (upstream fine-tune license).
`LICENSE-Apache-2.0`	—	—	Apache-2.0 text, for the portions inherited from the Apache-2.0 ancestor models.

Usage (onnxruntime-node)

Tokenize with the bundled tokenizer.json, run the session with input_ids + attention_mask, mean-pool last_hidden_state with the attention mask, then L2-normalize. Output is a 768-dim unit vector per input.

import { AutoTokenizer } from "@huggingface/transformers";
import * as ort from "onnxruntime-node";

const repo = "ReconOut/modernbert-embed-base_finetune_8192-ONNX";
const tokenizer = await AutoTokenizer.from_pretrained(repo);
const session = await ort.InferenceSession.create("./model_quantized.onnx");

const texts = ["The court granted the motion for summary judgment."];
const { input_ids, attention_mask } = await tokenizer(texts, {
  padding: true,
  truncation: true,
});

const outputs = await session.run({
  input_ids: new ort.Tensor("int64", input_ids.data, input_ids.dims),
  attention_mask: new ort.Tensor("int64", attention_mask.data, attention_mask.dims),
});

// Mean-pool last_hidden_state over non-padding tokens, then L2-normalize.
const hidden = outputs.last_hidden_state; // dims: [batch, seq, 768]
const [batch, seq, dim] = hidden.dims;
const embeddings = [];
for (let b = 0; b < batch; b++) {
  const vec = new Float32Array(dim);
  let count = 0;
  for (let t = 0; t < seq; t++) {
    if (attention_mask.data[b * seq + t] === 0n) continue;
    count++;
    for (let d = 0; d < dim; d++) {
      vec[d] += hidden.data[(b * seq + t) * dim + d];
    }
  }
  for (let d = 0; d < dim; d++) vec[d] /= count;

  let norm = 0;
  for (let d = 0; d < dim; d++) norm += vec[d] * vec[d];
  norm = Math.sqrt(norm);
  for (let d = 0; d < dim; d++) vec[d] /= norm;

  embeddings.push(vec); // 768-dim, unit length
}

Notes:

The graph takes input_ids and attention_mask only (no token_type_ids).
The upstream model descends from nomic-ai/modernbert-embed-base, which documents task prefixes (e.g. search_query: / search_document:). Check the upstream and ancestor model cards to decide whether prefixes apply to your use case — this mirror takes no position.
The model accepts sequences up to 8192 tokens — long enough to embed many legal documents without chunking, though memory and compute scale with sequence length.

Provenance

Converted on 2026-06-10 with the optimum + onnxruntime toolchain (optimum 2.x with optimum-onnx). Exact commands:

# 1) fp32 export (task: feature-extraction)
optimum-cli export onnx --model freelawproject/modernbert-embed-base_finetune_8192 --task feature-extraction <out>

# 2) int8 dynamic quantization of the fp32 export
python -c "from onnxruntime.quantization import quantize_dynamic, QuantType; quantize_dynamic('model.onnx','model_quantized.onnx',weight_type=QuantType.QInt8)"

Export validation (performed by optimum at export time): maximum absolute difference vs the PyTorch reference of 2.26e-05 on last_hidden_state for the fp32 export.

License

The upstream fine-tuned weights are released by the Free Law Project under CC0-1.0 (see LICENSE) — a public domain dedication. Note that CC0 does not grant trademark or patent rights, and no such rights are granted or implied by this mirror.

The fine-tune descends from Apache-2.0 ancestor models (answerdotai/ModernBERT-base by Answer.AI and nomic-ai/modernbert-embed-base by Nomic AI); the Apache-2.0 text is included as LICENSE-Apache-2.0 with this attribution for the portions inherited from those models.

The conversions in this repo (fp32 ONNX export and int8 quantization) are distributed under the same terms, with no additional restrictions and no warranty of any kind. See the upstream repo for the authoritative model card, training details, and citations.

Downloads last month: -

Model tree for ReconOut/modernbert-embed-base_finetune_8192-ONNX

Base model

answerdotai/ModernBERT-base

Finetuned

nomic-ai/modernbert-embed-base

Finetuned

freelawproject/modernbert-embed-base_finetune_8192

Quantized

(1)

this model