--- license: gpl-3.0 language: - en - zh - ja - de datasets: - JosephusCheung/GuanacoDataset - meta-math/MetaMathQA - jondurbin/airoboros-3.1 - WizardLM/WizardLM_evol_instruct_V2_196k - RyokoAI/ShareGPT52K - RyokoAI/Fandom23K - milashkaarshif/MoeGirlPedia_wikitext_raw_archive - wikipedia - wiki_lingua - garage-bAInd/Open-Platypus - LDJnr/Puffin - BAAI/COIG - TigerResearch/tigerbot-zhihu-zh-10k - liwu/MNBVC - teknium/openhermes - CausalLM/Refined-Anime-Text - microsoft/orca-math-word-problems-200k - m-a-p/CodeFeedback-Filtered-Instruction --- Tokenizer is different from cohere - and chat template is ChatML - fully fine-tuned at 128K+ ~ 30M entries long, web crawl input, GPT-4-32k/3.5-16k output, synthetic dataset - 1 epoch For another candidate version of 1 epoch - https://huggingface.co/CausalLM/35b-beta - somehow less overfitting? No loras, no quants, no tricks. This one is not "very 128k", use https://huggingface.co/CausalLM/35b-beta-long for long context. But better in general tasks, knowledge, coding and so on. And, merge them if you want!