Aryanne
/

Sheared-LLaMA-1.3B-gguf

Inference Endpoints

Model card Files Files and versions Community

Sheared-LLaMA-1.3B-gguf / README.md

Aryanne's picture

Update README.md

6c2b729 12 months ago

|

history blame contribute delete

No virus

2.02 kB

	---
	license: apache-2.0
	---
	Some GGUF v2 quantizations of the model [princeton-nlp/Sheared-LLaMA-1.3B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B)


	Sheared-LLaMA-1.3B is a model pruned and further pre-trained from [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). We dynamically load data from the [RedPajama dataset](https://github.com/togethercomputer/RedPajama-Data). We use 0.4B tokens for pruning and 50B tokens for continued pre-training the pruned model.

	- Smaller-scale
	- Same vocabulary as LLaMA1 and LLaMA2
	- Derived with a budget of 50B tokens by utilizing existing strong LLMs

	## Downstream Tasks

	We evaluate on an extensive set of downstream tasks including reasoning, reading comprehension, language modeling and knowledge intensive tasks. Our Sheared-LLaMA models outperform existing large language models.

	\| Model \| # Pre-training Tokens \| Average Performance \|
	\| ------------------- \| --------------------- \| ------------------- \|
	\| LLaMA2-7B \| 2T \| 64.6 \|

	1.3B

	\| Model \| # Pre-training Tokens \| Average Performance \|
	\| ------------------- \| --------------------- \| ------------------- \|
	\| OPT-1.3B \| 300B \| 48.2 \|
	\| Pythia-1.4B \| 300B \| 48.9 \|
	\| Sheared-LLaMA-1.3B \| 50B \| 51.0 \|

	3B

	\| Model \| # Pre-training Tokens \| Average Performance \|
	\| ------------------- \| --------------------- \| ------------------- \|
	\| OPT-2.7B \| 300B \| 51.4 \|
	\| Pythia-2.8B \| 300B \| 52.5 \|
	\| INCITE-Base-3B \| 800B \| 54.7 \|
	\| Open-LLaMA-3B-v1 \| 1T \| 55.1 \|
	\| Open-LLaMA-3B-v2 \| 1T \| 55.7 \|
	\| Sheared-LLaMA-2.7B \| 50B \| 56.7 \|