jinaai
/

jina-embeddings-v2-small-en

Feature Extraction

sentence-transformers

sentence-similarity

text-embeddings-inference

Model card Files Files and versions Community

bwang0911 commited on Oct 23, 2023

Commit

9a8bec8

·

1 Parent(s): 778540d

Update README.md

Files changed (1) hide show

README.md +4 -17

README.md CHANGED Viewed

@@ -2668,23 +2668,10 @@ print(cos_sim(embeddings[0], embeddings[1]))
 If you only want to handle shorter sequence, such as 2k, pass the `max_length` parameter to the `encode` function:
 ```python
-embeddings = model.encode(['Very long ... document'], max_length=2048)
-```
-For long sequences, it's recommended to perform inference using Flash Attention. Using Flash Attention allows you to increase the batch size and throughput for long sequence length.
-We include an experimental implementation for Flash Attention, shipped with the model.
-Install the following triton version:
-`pip install triton==2.0.0.dev20221202`.
-Now run the same code above, but make sure to set the parameter `with_flash` to `True` when you load the model. You also have to use either `fp16` or `bf16`:
-```python
-from transformers import AutoModel
-from numpy.linalg import norm
-import torch
-cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
-model = AutoModel.from_pretrained('jinaai/jina-embedding-s-en-v2', trust_remote_code=True, with_flash=True, torch_dtype=torch.float16).cuda() # trust_remote_code is needed to use the encode method
-embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])
-print(cos_sim(embeddings[0], embeddings[1]))
 ```
 ## Fine-tuning

 If you only want to handle shorter sequence, such as 2k, pass the `max_length` parameter to the `encode` function:
 ```python
+embeddings = model.encode(
+    ['Very long ... document'],
+    max_length=2048
+)
 ```
 ## Fine-tuning