Text Generation
Transformers
PyTorch
English
llama
Inference Endpoints
text-generation-inference

Model on your API Playground

#3
by 1littlecoder - opened

Hey Team, I tried to play with this model on your API playground, But I found it hard to get working - especially with the context length - while inputting and also outputing.

Any guide on that?

Together org

@1littlecoder hmm, if you have any feedback please let us know and we will keep improving! What are the challenges you are facing? Thanks!

Ce

I have the same issues, when having 12000 tokens as input, in the playground the answer would load for a few seconds and then just stop without an error. When using the API, I would get a timeout error after a while.

Together org

Hi @Sc0urge , thanks for your feedback! does the timeout issue persist? and what is the number of tokens that works without any timeout errors for you?

Hi @Sc0urge , thanks for your feedback! does the timeout issue persist? and what is the number of tokens that works without any timeout errors for you?

Even when just giving "Hello" as prompt it crashes, however this time with "An unknown error has occurred with inference" (in playground). Normal LLAMA works though

Together org

I couldn't observe this problem, can you let me know more details and the generation parameters you are using? Thanks!

I couldn't observe this problem, can you let me know more details and the generation parameters you are using? Thanks!

For the long text I set the max output to 32k, for just the hello I left everything on default. Sometimes it throws an error sometimes it just shows the 3 dots which disappear after a second

I see, I think for the hello example the issue might be that the default top_p=0.7 is too high (this is the threshold, below which all less likely tokens are filtered out ). So what likely happens is that after hello, the distribution for the next token is very flat and all tokens have probability < 0.7 (Intuitively, many tokens can follow hello and make sense). I would suggest to lower this threshold if your prompt is very short.

The other error most likely doesn't have anything to do with the hello prompt (I could not reproduce the error lately with the hello prompt). Are you still observing this error?

Sign up or log in to comment