Edit model card

Tokenizer is different from cohere - and chat template is ChatML - fully fine-tuned at 128K+ ~ 30M entries long, web crawl input, GPT-4-32k/3.5-16k output, synthetic dataset - 1 epoch

For another candidate version of 1 epoch - https://huggingface.co/CausalLM/35b-beta - somehow less overfitting?

No loras, no quants, no tricks.

This one is not "very 128k", use https://huggingface.co/CausalLM/35b-beta-long for long context. But better in general tasks, knowledge, coding and so on.

And, merge them if you want!

Downloads last month
1,247
Safetensors
Model size
35B params
Tensor type
BF16
·

Datasets used to train CausalLM/35b-beta2ep

Collection including CausalLM/35b-beta2ep