bwang0911 commited on
Commit
e7623d9
1 Parent(s): 1549bcd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -2664,6 +2664,12 @@ embeddings = model.encode(['How is the weather today?', 'What is the current wea
2664
  print(cos_sim(embeddings[0], embeddings[1]))
2665
  ```
2666
 
 
 
 
 
 
 
2667
  For long sequences, it's recommended to perform inference using Flash Attention. Using Flash Attention allows you to increase the batch size and throughput for long sequence length.
2668
  We include an experimental implementation for Flash Attention, shipped with the model.
2669
  Install the following triton version:
 
2664
  print(cos_sim(embeddings[0], embeddings[1]))
2665
  ```
2666
 
2667
+ If you only want to handle shorter sequence, such as 2k, pass the `max_length` parameter to the `encode` function:
2668
+
2669
+ ```python
2670
+ embeddings = model.encode(['Very long ... document'], max_length=2048)
2671
+ ```
2672
+
2673
  For long sequences, it's recommended to perform inference using Flash Attention. Using Flash Attention allows you to increase the batch size and throughput for long sequence length.
2674
  We include an experimental implementation for Flash Attention, shipped with the model.
2675
  Install the following triton version: