Context length?

by turboderp - opened Jan 29

Discussion

turboderp

Jan 29

Is this really 2k seq length? The base 70b seems to be 16k, is there something up with the config?

amgadhasan

Jan 29

cc @osanseviero

juewang

Jan 29

Same question here. The blog shows both the instruction and python models are long context fine-tuned.

michaelfeil

Jan 29

2048: https://huggingface.co/codellama/CodeLlama-70b-Instruct-hf/blob/5c0e18bec97099ebf50649c002631054e1b9725e/config.json#L13

yard1

Jan 29

Actually it should be 4096, it seems like the config.json is wrong (the conversion script needs to be updated is my guess). I confirmed that with a Meta engineer, plus you can see that in the reference implementation - https://github.com/facebookresearch/codellama/blob/1af62e1f43db1fa5140fa43cb828465a603a48f3/llama/model.py#L277 (self.params.max_seq_len * 2 where self.params.max_seq_len == 2048).

lmg-anon

Jan 29

•

edited Jan 29

The README says this is a model with 16k context, corroborating with turboderp's findings.

Code Llama is an auto-regressive language model that uses an optimized transformer architecture. It was fine-tuned with up to 16k tokens. This variant does not support long context of up to 100k tokens.

Altough I guess it could be wrong too.

turboderp

Jan 30

@yard1 Thanks.

It's a real shame that the instruct and python versions were nerfed like this, but I guess 4096 is a better starting point than 2048 at least. :(

mohdsoci

Jan 30

4096 for a coding model is painfully small.

viktor-ferenczi

Feb 1

Without 16k context length it is basically useless as a coding model.

viktor-ferenczi

Feb 1

I guess we need to wait for the instruct fine-tuned 16k versions created by others. Maybe Phind will make one, we'll see.

iphann

Feb 11

I guess we need to wait for the instruct fine-tuned 16k versions created by others. Maybe Phind will make one, we'll see.

加油Phind

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment