Context Length Config

#12

by nicklikets - opened Mar 12, 2024

Mar 12, 2024

In the README.md it states that the model has a context length of 128k, yet in the config.json it states "max_position_embeddings": 8192. How comes the maximum positional embeddings isn't configured to ~128k?

saurabhdash

Cohere Labs org Mar 12, 2024

This implementation is based on the Llama implementation which materializes this huge buffer which would not be feasible for 128k context. The model does support 128k context with a better implementation.

causal_mask = torch.full( (config.max_position_embeddings, config.max_position_embeddings), fill_value=True, dtype=torch.bool )

WaveCut

Mar 12, 2024

Maintaining context length defaults low enough to prevent end users from experiencing OOM right out of the box is generally accepted as an unwritten rule in the HF community.

nicklikets changed discussion status to closed Mar 13, 2024

ciprianv

Oct 24, 2024

Command-r defaults to 128k

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment