Context length?

#2
by turboderp - opened

Is this really 2k seq length? The base 70b seems to be 16k, is there something up with the config?

Same question here. The blog shows both the instruction and python models are long context fine-tuned.

Actually it should be 4096, it seems like the config.json is wrong (the conversion script needs to be updated is my guess). I confirmed that with a Meta engineer, plus you can see that in the reference implementation - https://github.com/facebookresearch/codellama/blob/1af62e1f43db1fa5140fa43cb828465a603a48f3/llama/model.py#L277 (self.params.max_seq_len * 2 where self.params.max_seq_len == 2048).

The README says this is a model with 16k context, corroborating with turboderp's findings.

Code Llama is an auto-regressive language model that uses an optimized transformer architecture. It was fine-tuned with up to 16k tokens. This variant does not support long context of up to 100k tokens.

Altough I guess it could be wrong too.

@yard1 Thanks.

It's a real shame that the instruct and python versions were nerfed like this, but I guess 4096 is a better starting point than 2048 at least. :(

4096 for a coding model is painfully small.

Without 16k context length it is basically useless as a coding model.

I guess we need to wait for the instruct fine-tuned 16k versions created by others. Maybe Phind will make one, we'll see.

I guess we need to wait for the instruct fine-tuned 16k versions created by others. Maybe Phind will make one, we'll see.

加油Phind

Sign up or log in to comment