invalid-coder
/

TinyLlama-1.1B-intermediate-step-1431k-3T-laser-dpo

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

TinyLlama-1.1B-intermediate-step-1431k-3T-laser-dpo / README.md

invalid-coder's picture

Create README.md

10f85ea verified 6 months ago

|

history blame contribute delete

No virus

2.86 kB

	---
	license: apache-2.0
	datasets:
	- cerebras/SlimPajama-627B
	- bigcode/starcoderdata
	language:
	- en
	---
	<div align="center">

	# TinyLlama-1.1B-intermediate-step-1431k-3T-laser-dpo

	It follows the implementation of laserRMT @ https://github.com/cognitivecomputations/laserRMT
	and the novel training technique - we partially freeze the model according to a laser-like
	analysis (Official Paper soon) which effectively prevents the significant problem of language
	models forgetting previously acquired knowledge. This aspect is particularly crucial when attempting
	to teach the model specific skills, such as function calling.


	# TinyLlama-1.1B
	</div>

	https://github.com/jzhang38/TinyLlama

	The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01.

	<div align="center">
	<img src="./TinyLlama_logo.png" width="300"/>
	</div>

	We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.

	#### This Collection
	This collection contains all checkpoints after the 1T fix. Branch name indicates the step and number of tokens seen.

	#### Eval

	\| Model \| Pretrain Tokens \| HellaSwag \| Obqa \| WinoGrande \| ARC_c \| ARC_e \| boolq \| piqa \| avg \|
	\|-------------------------------------------\|-----------------\|-----------\|------\|------------\|-------\|-------\|-------\|------\|-----\|
	\| Pythia-1.0B \| 300B \| 47.16 \| 31.40\| 53.43 \| 27.05 \| 48.99 \| 60.83 \| 69.21 \| 48.30 \|
	\| TinyLlama-1.1B-intermediate-step-50K-104b \| 103B \| 43.50 \| 29.80\| 53.28 \| 24.32 \| 44.91 \| 59.66 \| 67.30 \| 46.11\|
	\| TinyLlama-1.1B-intermediate-step-240k-503b\| 503B \| 49.56 \|31.40 \|55.80 \|26.54 \|48.32 \|56.91 \|69.42 \| 48.28 \|
	\| TinyLlama-1.1B-intermediate-step-480k-1007B \| 1007B \| 52.54 \| 33.40 \| 55.96 \| 27.82 \| 52.36 \| 59.54 \| 69.91 \| 50.22 \|
	\| TinyLlama-1.1B-intermediate-step-715k-1.5T \| 1.5T \| 53.68 \| 35.20 \| 58.33 \| 29.18 \| 51.89 \| 59.08 \| 71.65 \| 51.29 \|
	\| TinyLlama-1.1B-intermediate-step-955k-2T \| 2T \| 54.63 \| 33.40 \| 56.83 \| 28.07 \| 54.67 \| 63.21 \| 70.67 \| 51.64 \|
	\| TinyLlama-1.1B-intermediate-step-1195k-2.5T \| 2.5T \| 58.96 \| 34.40 \| 58.72 \| 31.91 \| 56.78 \| 63.21 \| 73.07 \| 53.86\|
	\| TinyLlama-1.1B-intermediate-step-1431k-3T \| 3T \| 59.20 \| 36.00 \| 59.12 \| 30.12 \| 55.25 \| 57.83 \| 73.29 \| 52.99\|