feat: add sbert support
#25
by
bwang0911
- opened
- code mostly from @tomaarsen , i made some modifications with small changes, note: custom_st.py was directly added to this repo, not impl repo.
- tested on my own replication (test code below), note, added
2_Normalize
tomodules.json
to ensure embedding always noramlised as default. - once sbert release, should PR and update Readme.
test code:
from sentence_transformers import SentenceTransformer
from transformers import AutoModel
import numpy as np
import numpy.testing as npt
model = SentenceTransformer('bwang0911/test-jina-clip', trust_remote_code=True)
et = model.encode(['Hello world'])
em = model.encode(['https://i.pinimg.com/600x315/21/48/7e/21487e8e0970dd366dafaed6ab25d8d8.jpg'])
model2 = AutoModel.from_pretrained('bwang0911/test-jina-clip', trust_remote_code=True)
et2 = model2.encode_text(['Hello world'])
em2 = model2.encode_image(['https://i.pinimg.com/600x315/21/48/7e/21487e8e0970dd366dafaed6ab25d8d8.jpg'])
assert np.allclose(et, et2, rtol=1e-4, atol=1e-4), "Arrays are not almost equal"
assert np.allclose(em, em2, rtol=1e-4, atol=1e-4), "Arrays are not almost equal"
bwang0911
changed pull request status to
merged