Instructions to use DriptoBhattacharyya/astranexus-mm-encoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use DriptoBhattacharyya/astranexus-mm-encoder with PEFT:
Task type is invalid.
- sentence-transformers
How to use DriptoBhattacharyya/astranexus-mm-encoder with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("DriptoBhattacharyya/astranexus-mm-encoder") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
AstraNexus Multimodal Email Encoder (DriptoBhattacharyya/astranexus-mm-encoder)
LoRA-fine-tuned late-fusion multimodal encoder for unsupervised email
clustering. Text branch = MiniLM (all-MiniLM-L6-v2), vision branch =
google/siglip-large-patch16-384, joined by a learned
256-d fusion head. Trained with supervised contrastive loss on
600 teacher-labeled emails (topic labels distilled from
Qwen2.5-7B-Instruct), 6 epochs on 2ΓT4.
Why this exists
Off-the-shelf fixed-weight fusion can't win both text-clear and image-decisive emails (no single Ξ± is best for both). This encoder learns the fusion, so it clusters hard cases (generic subject + topic-revealing attachment) correctly.
Results β independent hard eval (120 emails, image-decisive)
| ARI | NMI | |
|---|---|---|
| Off-the-shelf fused (gate) | 0.168 | 0.519 |
| Fine-tuned (this model) | 0.252 | 0.511 |
Ξ ARI +0.084. The gate is an honest off-the-shelf baseline on the same encoders; the lift is purely from the LoRA + learned fusion.
Files
text_lora/β PEFT-LoRA adapter for the MiniLM text towervision_lora/β PEFT-LoRA adapter for the SigLIP vision towerproj.ptβ learned fusion head (Linear(t+v, 256) -> GELU -> Linear(256, 256))
Usage
See astranexus/cluster/ft_encoder.py in the AstraNexus repo β loads both
adapters + the fusion head and exposes encode(emails) -> np.ndarray.
Reproducibility note
Trained on Kaggle (torch 2.10, 2ΓT4). The SigLIP LoRA adapter keys use the
vision_model. module layout from the training-time transformers; newer
transformers (5.x) flattened SigLIP, which shifts both the adapter key paths
and the frozen base-model numerics. ft_encoder._load_adapter_robust
remaps the keys, but for faithful results pin transformers to the training
line (4.x) and install torchvision (matches the image processor). The eval
numbers above were measured in the training environment.
- Downloads last month
- -
Model tree for DriptoBhattacharyya/astranexus-mm-encoder
Base model
google/siglip-large-patch16-384