lateon-onnx

This repository hosts the optimized, single-file ONNX representation of the ModernBERT-backed lightonai/LateOn model.

It is designed to run completely PyTorch-free and dependency-free using the intextus-embed runtime library.

Model Metadata

  • Backbone: ModernBERT-base (140M parameters)
  • Output Dimensions: 128-dimensional late-interaction embeddings
  • ONNX File Size: 580 MB (fully self-contained, merged weights)
  • Case Sensitivity: Case-sensitive (requires do_lower_case=False)

Usage

Install the intextus-embed runtime:

pip install intextus-embed

Load the model automatically and run inference (set do_lower_case=False because LateOn is case-sensitive):

from intextus import IntextusEncoder, compute_maxsim

# Automatically downloads and caches the model from Hugging Face
model = IntextusEncoder("lateon", do_lower_case=False)

# Encode queries and documents
query_embeddings = model.encode_queries("What is ultra-low latency?")
doc_embeddings = model.encode_docs("ONNX runtime bypasses the PyTorch layer completely.")

# Compute the MaxSim similarity score via NumPy
score = compute_maxsim(query_embeddings[0], doc_embeddings[0])
print(f"Relevance Score (MaxSim): {score:.4f}")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support