IDMR-8B
IDMR is a universal multimodal embedding model, particularly well-suited for Instance-Driven Multimodal Retrieval (IDMR) tasks. It is designed to achieve fine-grained, instance-level visual correspondence across modalities.
π Learn More About IDMR
- π Paper: IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval
- π€ Demo: IDMR Demo on Hugging Face Spaces
- π» Code: Github
π Usage
To get started, clone the GitHub repository and install the required dependencies:
git clone https://github.com/BwLiu01/IDMR.git
cd IDMR
pip install -r requirements.txt
import torch
import numpy as np
from PIL import Image
from src.model import IDMRModel
from src.vlm_backbone.intern_vl import InternVLProcessor
from src.arguments import ModelArguments
from transformers import AutoTokenizer, AutoImageProcessor
device = "cuda"
IMAGE_TOKEN = "<image>"
# Load model and processor
model_args = ModelArguments(model_name="lbw18601752667/IDMR-8B", model_backbone="internvl_2_5")
# Initialize processor
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name, trust_remote_code=True)
image_processor = AutoImageProcessor.from_pretrained(model_args.model_name, trust_remote_code=True, use_fast=False)
processor = InternVLProcessor(image_processor=image_processor, tokenizer=tokenizer)
# Load model
model = IDMRModel.load(model_args).to(device, dtype=torch.bfloat16).eval()
def get_embedding(text, image=None, type="qry"):
"""Get embedding for text and/or image input"""
inputs = processor(
text=f"{IMAGE_TOKEN}\n {text}" if text else f"{IMAGE_TOKEN}\n Represent the given image.",
images=[image] if image else None,
return_tensors="pt",
max_length=1024,
truncation=True
)
inputs = {key: value.to(device) for key, value in inputs.items()}
inputs["image_flags"] = torch.tensor([1 if image else 0], dtype=torch.long).to(device)
with torch.no_grad(), torch.autocast(device_type=device, dtype=torch.bfloat16):
if type == "qry":
output = model(qry=inputs)["qry_reps"]
else:
output = model(tgt=inputs)["tgt_reps"]
return output.float()
# Query
query_text = "your query text"
query_image = Image.open("your query image path")
query_embedding = get_embedding(query_text, query_image, type="qry")
# Target
target_image = Image.open("your target image path")
target_embedding = get_embedding(None, target_image, type="tgt")
print(model.compute_similarity(query_embedding, target_embedding))
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
HF Inference deployability: The model has no library tag.
Model tree for lbw18601752667/IDMR-8B
Base model
OpenGVLab/InternVL2_5-8B