Doctor-Shotgun's picture
Update README.md
129bcc2
|
raw
history blame contribute delete
No virus
849 Bytes
metadata
datasets:
  - cognitivecomputations/WizardLM_evol_instruct_V2_196k_unfiltered_merged_split
  - cognitivecomputations/Code-74k-ShareGPT-Vicuna
  - jondurbin/airoboros-3.1
  - Norquinal/claude_multiround_chat_30k
  - Doctor-Shotgun/no-robots-sharegpt
language:
  - en
tags:
  - llama
  - llama 2
  - smol_llama

smol_llama-220M-GQA-32k-theta-sft

Experimental model meant to serve as a long-context speculative decoding model.

Created using Doctor-Shotgun/smol_llama-220M-GQA-32k-theta and finetuning at 32768 context length on several instruction datasets.

This variant uses the rope theta (rope frequency base) method for context extension.

The trained instruction format is Alpaca:

### Instruction:
{{instruction}}

### Input:
{{user input}}

### Response:
{{model response}}