Update README.md
Browse files
README.md
CHANGED
@@ -2668,23 +2668,10 @@ print(cos_sim(embeddings[0], embeddings[1]))
|
|
2668 |
If you only want to handle shorter sequence, such as 2k, pass the `max_length` parameter to the `encode` function:
|
2669 |
|
2670 |
```python
|
2671 |
-
embeddings = model.encode(
|
2672 |
-
|
2673 |
-
|
2674 |
-
|
2675 |
-
We include an experimental implementation for Flash Attention, shipped with the model.
|
2676 |
-
Install the following triton version:
|
2677 |
-
`pip install triton==2.0.0.dev20221202`.
|
2678 |
-
Now run the same code above, but make sure to set the parameter `with_flash` to `True` when you load the model. You also have to use either `fp16` or `bf16`:
|
2679 |
-
```python
|
2680 |
-
from transformers import AutoModel
|
2681 |
-
from numpy.linalg import norm
|
2682 |
-
import torch
|
2683 |
-
|
2684 |
-
cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
|
2685 |
-
model = AutoModel.from_pretrained('jinaai/jina-embedding-s-en-v2', trust_remote_code=True, with_flash=True, torch_dtype=torch.float16).cuda() # trust_remote_code is needed to use the encode method
|
2686 |
-
embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])
|
2687 |
-
print(cos_sim(embeddings[0], embeddings[1]))
|
2688 |
```
|
2689 |
|
2690 |
## Fine-tuning
|
|
|
2668 |
If you only want to handle shorter sequence, such as 2k, pass the `max_length` parameter to the `encode` function:
|
2669 |
|
2670 |
```python
|
2671 |
+
embeddings = model.encode(
|
2672 |
+
['Very long ... document'],
|
2673 |
+
max_length=2048
|
2674 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2675 |
```
|
2676 |
|
2677 |
## Fine-tuning
|