jinaai/jina-clip-v1 · feat: add sbert support

Jina AI org Sep 9, 2024

code mostly from @tomaarsen , i made some modifications with small changes, note: custom_st.py was directly added to this repo, not impl repo.
tested on my own replication (test code below), note, added 2_Normalize to modules.json to ensure embedding always noramlised as default.
once sbert release, should PR and update Readme.

test code:

from sentence_transformers import SentenceTransformer
from transformers import AutoModel

import numpy as np
import numpy.testing as npt


model = SentenceTransformer('bwang0911/test-jina-clip', trust_remote_code=True)

et = model.encode(['Hello world'])

em = model.encode(['https://i.pinimg.com/600x315/21/48/7e/21487e8e0970dd366dafaed6ab25d8d8.jpg'])


model2 = AutoModel.from_pretrained('bwang0911/test-jina-clip', trust_remote_code=True)

et2 = model2.encode_text(['Hello world'])
em2 = model2.encode_image(['https://i.pinimg.com/600x315/21/48/7e/21487e8e0970dd366dafaed6ab25d8d8.jpg'])

assert np.allclose(et, et2, rtol=1e-4, atol=1e-4), "Arrays are not almost equal"
assert np.allclose(em, em2, rtol=1e-4, atol=1e-4), "Arrays are not almost equal"

feat: add sbert support32864adf

bwang0911 changed pull request status to merged Sep 9, 2024