Getting the error: "triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 180224, Hardware limit: 166912. Reducing block sizes or `num_stages` may help."
#27 opened 9 days ago
by
Pranav0511
![](https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/0YPfU7dLeWbkwU0RfpUi5.jpeg)
Why the inference speed so slow compare with same 7B parameters of Qwen?
#26 opened 19 days ago
by
lucasjin
Upload triton_flash_blocksparse_attn.py
#25 opened 20 days ago
by
barcelosallan
Phi-3-small doesn't load with TGI
1
#24 opened 27 days ago
by
aveer30
Multi-GPU training fails when using device_map = "auto"
2
#23 opened 28 days ago
by
aveer30
Shared memory error
7
#15 opened about 2 months ago
by
marktenenholtz
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1640544247032-noauth.jpeg)
RuntimeError: FlashAttention only support fp16 and bf16 data type during fine tuning.
7
#11 opened about 2 months ago
by
faizsameerahmed96
GGUF version?
3
#9 opened about 2 months ago
by
shtirlic
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6575b711b951d40e7a505b90/2489IHvIxzQZRFT1S2LEp.png)
No triton for windows
2
#4 opened 2 months ago
by
fernandomir