Doctor-Shotgun
/

smol_llama-220M-GQA-32k-theta

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

smol_llama-220M-GQA-32k-theta / README.md

Doctor-Shotgun's picture

Create README.md

a65c7bb 11 months ago

|

1.01 kB

	---
	license: apache-2.0
	datasets:
	- togethercomputer/RedPajama-Data-1T-Sample
	language:
	- en
	tags:
	- llama
	- llama 2
	- smol_llama
	---
	# smol_llama-220M-GQA-32k-theta

	Experimental model meant to serve as a long-context speculative decoding model.

	Created using [BEE-spoke-data/smol_llama-220M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA) and further pretraining at 32768 context length on [togethercomputer/RedPajama-Data-1T-Sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample).

	This variant uses the rope theta (rope frequency base) method for context extension.

	Wikitext Perplexity (64 rows) as evaluated by [exllamav2](https://github.com/turboderp/exllamav2):
	```
	Base Model
	2048: 20.2193
	4096: 102.6928
	8192: 235.5210
	16384: 390.7198
	32768: 515.8053

	32k - Linear Rope Scale 16.0
	2048: 25.7148
	4096: 23.4461
	8192: 22.3326
	16384: 21.6744
	32768: 21.4317

	32k - Rope Theta 1000000.0
	2048: 20.2158
	4096: 18.3868
	8192: 17.5976
	16384: 17.1462
	32768: 16.6989
	```