Context Length Config

#12
by nicklikets - opened

In the README.md it states that the model has a context length of 128k, yet in the config.json it states "max_position_embeddings": 8192. How comes the maximum positional embeddings isn't configured to ~128k?

Cohere For AI org

This implementation is based on the Llama implementation which materializes this huge buffer which would not be feasible for 128k context. The model does support 128k context with a better implementation.

causal_mask = torch.full( (config.max_position_embeddings, config.max_position_embeddings), fill_value=True, dtype=torch.bool )

Maintaining context length defaults low enough to prevent end users from experiencing OOM right out of the box is generally accepted as an unwritten rule in the HF community.

nicklikets changed discussion status to closed

Sign up or log in to comment