Text Generation
Transformers
PyTorch
English
llama
Inference Endpoints
text-generation-inference
TheBloke's picture
Set use_cache to True, otherwise inference performance is poor
5e8c41a