Why is each generated response so short / cut off / not complete?

by Krisolada - opened Nov 17, 2023

Nov 17, 2023

•

edited Nov 17, 2023

Please see the attached image:

We hope that each response can be longer and complete. We have to keep clicking on "Compute" button when testing to get more generated text back.

We wonder if this has to do with our host instance size:

Nvidia A10G
1x GPU · 24 GB
6 vCPU · 28 GB
1.3/hr

Thanks! Please help!

teknium

Owner Nov 17, 2023

Please see the attached image:

We hope that each response can be longer and complete. We have to keep clicking on "Compute" button when testing to get more generated text back.

We wonder if this has to do with our host instance size:

Nvidia A10G
1x GPU · 24 GB
6 vCPU · 28 GB
1.3/hr

Thanks! Please help!

What is the code you are using or where are you using it?

Krisolada

Nov 17, 2023

Please see the attached image:

We hope that each response can be longer and complete. We have to keep clicking on "Compute" button when testing to get more generated text back.

We wonder if this has to do with our host instance size:

Nvidia A10G
1x GPU · 24 GB
6 vCPU · 28 GB
1.3/hr

Thanks! Please help!

What is the code you are using or where are you using it?

Hi, thanks for getting back to us!

We are simply testing at this stage using POSTMAN:

We pass in the token, set Content-Type: application/json, and send the example question '{"inputs": "What is the philosopher'''s stone, really?"}' via the request body.

The text generated in the screenshot above is all we get.

Hope the information helps!

teknium

Owner Nov 18, 2023

Hi, thanks for getting back to us!

We are simply testing at this stage using POSTMAN:

We pass in the token, set Content-Type: application/json, and send the example question '{"inputs": "What is the philosopher'''s stone, really?"}' via the request body.

The text generated in the screenshot above is all we get.

Hope the information helps!

No I mean, are you using HF Transformers, TGI, VLLM, Llama.cpp, or what inference engine. It sounds like you are using some API based serving of the model, but not what that backend is

Jackgelove

Jul 15, 2024

Please see the attached image:

We hope that each response can be longer and complete. We have to keep clicking on "Compute" button when testing to get more generated text back.

We wonder if this has to do with our host instance size:

Nvidia A10G
1x GPU · 24 GB
6 vCPU · 28 GB
1.3/hr

Thanks! Please help!

What's the code to run this model? pls help, I am using Nvidia 4090(24GB), thanks in advance!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment