AstraNexus Multimodal Email Encoder (DriptoBhattacharyya/astranexus-mm-encoder)

LoRA-fine-tuned late-fusion multimodal encoder for unsupervised email clustering. Text branch = MiniLM (all-MiniLM-L6-v2), vision branch = google/siglip-large-patch16-384, joined by a learned 256-d fusion head. Trained with supervised contrastive loss on 600 teacher-labeled emails (topic labels distilled from Qwen2.5-7B-Instruct), 6 epochs on 2Γ—T4.

Why this exists

Off-the-shelf fixed-weight fusion can't win both text-clear and image-decisive emails (no single Ξ± is best for both). This encoder learns the fusion, so it clusters hard cases (generic subject + topic-revealing attachment) correctly.

Results β€” independent hard eval (120 emails, image-decisive)

ARI NMI
Off-the-shelf fused (gate) 0.168 0.519
Fine-tuned (this model) 0.252 0.511

Ξ” ARI +0.084. The gate is an honest off-the-shelf baseline on the same encoders; the lift is purely from the LoRA + learned fusion.

Files

  • text_lora/ β€” PEFT-LoRA adapter for the MiniLM text tower
  • vision_lora/ β€” PEFT-LoRA adapter for the SigLIP vision tower
  • proj.pt β€” learned fusion head (Linear(t+v, 256) -> GELU -> Linear(256, 256))

Usage

See astranexus/cluster/ft_encoder.py in the AstraNexus repo β€” loads both adapters + the fusion head and exposes encode(emails) -> np.ndarray.

Reproducibility note

Trained on Kaggle (torch 2.10, 2Γ—T4). The SigLIP LoRA adapter keys use the vision_model. module layout from the training-time transformers; newer transformers (5.x) flattened SigLIP, which shifts both the adapter key paths and the frozen base-model numerics. ft_encoder._load_adapter_robust remaps the keys, but for faithful results pin transformers to the training line (4.x) and install torchvision (matches the image processor). The eval numbers above were measured in the training environment.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DriptoBhattacharyya/astranexus-mm-encoder

Adapter
(1)
this model

Spaces using DriptoBhattacharyya/astranexus-mm-encoder 2