smol_llama-220M-GQA-32k-theta-sft

Experimental model meant to serve as a long-context speculative decoding model.

Created using Doctor-Shotgun/smol_llama-220M-GQA-32k-theta and finetuning at 32768 context length on several instruction datasets.

This variant uses the rope theta (rope frequency base) method for context extension.

The trained instruction format is Alpaca:

### Instruction:
{{instruction}}

### Input:
{{user input}}

### Response:
{{model response}}

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

Datasets used to train Doctor-Shotgun/smol_llama-220M-GQA-32k-theta-sft