Update README.md
Browse files
README.md
CHANGED
@@ -2664,6 +2664,12 @@ embeddings = model.encode(['How is the weather today?', 'What is the current wea
|
|
2664 |
print(cos_sim(embeddings[0], embeddings[1]))
|
2665 |
```
|
2666 |
|
|
|
|
|
|
|
|
|
|
|
|
|
2667 |
For long sequences, it's recommended to perform inference using Flash Attention. Using Flash Attention allows you to increase the batch size and throughput for long sequence length.
|
2668 |
We include an experimental implementation for Flash Attention, shipped with the model.
|
2669 |
Install the following triton version:
|
|
|
2664 |
print(cos_sim(embeddings[0], embeddings[1]))
|
2665 |
```
|
2666 |
|
2667 |
+
If you only want to handle shorter sequence, such as 2k, pass the `max_length` parameter to the `encode` function:
|
2668 |
+
|
2669 |
+
```python
|
2670 |
+
embeddings = model.encode(['Very long ... document'], max_length=2048)
|
2671 |
+
```
|
2672 |
+
|
2673 |
For long sequences, it's recommended to perform inference using Flash Attention. Using Flash Attention allows you to increase the batch size and throughput for long sequence length.
|
2674 |
We include an experimental implementation for Flash Attention, shipped with the model.
|
2675 |
Install the following triton version:
|