--- datasets: - cognitivecomputations/WizardLM_evol_instruct_V2_196k_unfiltered_merged_split - cognitivecomputations/Code-74k-ShareGPT-Vicuna - jondurbin/airoboros-3.1 - Norquinal/claude_multiround_chat_30k - Doctor-Shotgun/no-robots-sharegpt language: - en tags: - llama - llama 2 - smol_llama --- # smol_llama-220M-GQA-32k-theta-sft Experimental model meant to serve as a long-context speculative decoding model. Created using [Doctor-Shotgun/smol_llama-220M-GQA-32k-theta](https://huggingface.co/Doctor-Shotgun/smol_llama-220M-GQA-32k-theta) and finetuning at 32768 context length on several instruction datasets. This variant uses the rope theta (rope frequency base) method for context extension. The trained instruction format is Alpaca: ``` ### Instruction: {{instruction}} ### Input: {{user input}} ### Response: {{model response}} ```