TBA

This model is suitable for inference with 8-bit precision (both weight and KV Cache applied) on a 48GB GPU at 20K context or 6.5 BPW on an 80GB GPU at 100K context.

Memory	BPW (weights & KV Cache)	Context Length
48GB	8	20K tokens
48GB	6.5	32K tokens
80GB	6.5	100K tokens

Custom code is used, so ensure safety before using 'trust_remote_code=True'.

This model has undergone minimal human preference alignment. Proper system prompts and prompt engineering are necessary to ensure desirable responses. The model references the safety settings of the Gemini model series and has received minimal safety alignment towards system prompts to avoid generating severely inappropriate responses without specific instructions. By default, it can engage in discussions on relatively sensitive topics without violating human ethical and moral principles. Of course, you can always specify "uncensored" in your system prompt to obtain "raw" responses that have not been aligned with human values.

Modified from Cohere models (CohereForAI/c4ai-command-r-v01, CohereForAI/aya-23-35B), users should follow their AUPs.

Tokenizer is different from cohere - and chat template is ChatML.

For more information, please refer to the SFT version: https://huggingface.co/CausalLM/35b-beta-long

CausalLM
/

33b-e

You need to agree to share your contact information to access this model

TBA