TinyLlama-1.1B-32k

NOTE: This is a fork of the original model at https://huggingface.co/Doctor-Shotgun/TinyLlama-1.1B-32k but with fixed safetensors metadata using the following code:

import safetensors
from safetensors.torch import save_file

tensors = dict()
with safetensors.safe_open(safetensors_path, framework="pt") as f:
    for key in f.keys():
        tensors[key] = f.get_tensor(key)

save_file(tensors, safetensors_path, metadata={'format': 'pt'})

(from https://huggingface.co/SeaLLMs/SeaLLM-7B-Hybrid/discussions/2#65752144412ee70185d49ff5)

Original model card:

32k context finetune of TinyLlama-1.1B using increased rope theta (rope frequency base) meant to serve as a long-context speculative decoding model.

Created using TinyLlama-1.1B and further pretraining at 32768 context length on togethercomputer/RedPajama-Data-1T-Sample.

Of note, the base checkpoint used was from commit "final model" fad4f1a5cd0563ac41349b8fec2e6e51156568a0 which was subsequently reverted, and not the current main branch 3T checkpoint of TinyLlama-1.1B.

Wikitext (wikitext-2-raw-v1_train) Perplexity (64 rows) as evaluated via exllamav2:

Model 2048 4096 8192 16384 32768
TinyLlama-1.1B 8.5633 208.3586 863.7507 1600.5021 6981.9021
TinyLlama-1.1B-32k 8.6548 7.8339 7.4904 7.3674 7.1338

Evaluation on HumanEval by turboderp:

Model Pass@1 Pass@10
TinyLlama-1.1B 0.0841 0.1524
TinyLlama-1.1B (NTK alpha=7.7) 0.0598 0.1098
TinyLlama-1.1B-32k-ckpt-554 0.0732 0.1402
TinyLlama-1.1B-32k 0.0829 0.1524
Downloads last month
13
Safetensors
Model size
1.1B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for LouisML/tinyllama_32k

Finetunes
2 models

Dataset used to train LouisML/tinyllama_32k