Why does throughput increase with longer context window?

#33

by jingyu-q - opened Apr 4

Apr 4

Hello there,

Thanks for releasing the Jamba model! As I was reading the paper, I've a question regarding picture 3(b). Why does the throughput of all models increase with longer context window? Can't make sense of it and appreciate it if you can explain. Thank you.

roicohennn

AI21 org Apr 4

Hi @jingyu-q ,

As explained in the paper, the throughput in the graph is measured as token/second, so it's on a per token basis. Generation time increases with longer context, but normalized to a per token basis, it's faster, and much faster for Jamba as context window get big.

jingyu-q

Apr 4

Hi @roicohennn ,

If it is on per token basis, and as the setting says the output fixed at 512 tokens, and the generation time increases with longer context, shouldn't throughput (=512tokens/time_taken) be smaller with longer context?

kanttouchthis

Apr 6

the only thing i can imagine looking at the graph is that it's output tokens per second times number of input tokens divided by some constant. i suggest the graph be labeled more clearly as to what throughput (t/s) actually means

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment