How did you calcualte the perplexities for 2048 and 3072 contexts?

#3
by Shouyi987 - opened

In the model card, it states to set the max_seq_len to 8192 and compress_pos_emb to 4.

However, there are comparisons of perplexities for context sizes of 2048 and 3072. How did you do that? Did you set the context size to these numbers when loading the model and then compare the perplexities?

Aren't these context sizes expected to have poor perplexities?

Shouyi987 changed discussion title from What size should I set for the context? to How did you calcualte the perplexities for 2048 and 3072 contexts?

I used the perplexity tool in oobabooga text-generation-webui. It's computed over wikitext with the context window set at either 2048 or 3072, strided by 512 tokens.

At train time, the RoPE scaling was set with max_position_embeddings = 8192 and a scaling factor of 4. This is what is used for all the perplexity calculations.

The perplexity calculations differ in the size of the sequence upon which it's evaluated. The point of this is to confirm the perplexity doesn't blow up beyond 2048, and that it can potentially outperform the SuperHOT LoRA (as applied to this model when trained without RoPE scaling ). Early feedback is that this model stays quite coherent out to it's limit of 8192.

I would have run the perplexity calculations all the way out to 8192, but I ran into VRAM limits. You need ExLlama to run full context with 48gb VRAM, and currently the perplexity tool isn’t compatible with it.

Thank you so much for your explaination!

bhenrym14 changed discussion status to closed

Sign up or log in to comment