Slow API Calls in Transformers

#98
by Zifeng0825 - opened

I've started using the official transformer code recommended for API calls, but I'm encountering an issue where each call takes a significantly long time to complete...

Has anyone else faced similar challenges? I want to use this pre-trained model for text generate. I'm looking for any advice or tips on how to optimize these API calls to reduce the response time. Are there settings within the transformer model that I should tweak, or is there a more efficient way to handle these operations?

This comment has been hidden

ı use ollama, best for usage.

Sign up or log in to comment