JustinLin610 commited on
Commit
1c7d721
1 Parent(s): 2cf2e83

update README.md (#12)

Browse files

- update README.md (92449b352ef8de40b67b1db228705a6105226106)

Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -67,6 +67,10 @@ cd flash-attention && pip install .
67
  # pip install csrc/layer_norm
68
  # pip install csrc/rotary
69
  ```
 
 
 
 
70
  <br>
71
 
72
 
 
67
  # pip install csrc/layer_norm
68
  # pip install csrc/rotary
69
  ```
70
+
71
+ 如果您有更高推理性能方面的需求,但上述可选加速项`layer_norm`及`rotary`未能安装成功,或是您所使用的GPU不满足`flash-attention`库所要求的NVIDIA Ampere/Ada/Hopper架构,您可以尝试切换至dev_triton分支,使用该分支下基于Triton实现的推理加速方案。该方案适用于更宽范围的GPU产品,在pytorch 2.0及以上版本原生支持,无需额外安装操作。
72
+
73
+ If you require higher inference performance yet encounter some problems when installing the optional acceleration features (i.e., `layer_norm` and `rotary`) or if the GPU you are using does not meet the NVIDIA Ampere/Ada/Hopper architecture required by the `flash-attention` library, you may switch to the dev_triton branch and consider trying the inference acceleration solution implemented with Triton in this branch. This solution adapts to a wider range of GPU products and does not require extra package installation with pytorch version 2.0 and above.
74
  <br>
75
 
76