TinyLlama-1.1B-step-50K-105b / README.md

Update README.md

f733702 about 1 year ago

5.23 kB

	---
	license: apache-2.0
	datasets:
	- cerebras/SlimPajama-627B
	- bigcode/starcoderdata
	language:
	- en
	---
	<div align="center">

	# TinyLlama-1.1B
	</div>

	https://github.com/jzhang38/TinyLlama

	The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01.

	<div align="center">
	<img src="./TinyLlama_logo.png" width="300"/>
	</div>

	We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.


	#### Releases Schedule
	We will be rolling out intermediate checkpoints following the below schedule. We also include some baseline models for comparison.

	\| Date \| HF Checkpoint \| Tokens \| Step \| HellaSwag Acc_norm \|
	\|------------\|-------------------------------------------------\|--------\|------\|---------------------\|
	\| Baseline \| [StableLM-Alpha-3B](https://huggingface.co/stabilityai/stablelm-base-alpha-3b)\| 800B \| -- \| 38.31 \|
	\| Baseline \| [Pythia-1B-intermediate-step-50k-105b](https://huggingface.co/EleutherAI/pythia-1b/tree/step50000) \| 105B \| 50k \| 42.04 \|
	\| Baseline \| [Pythia-1B](https://huggingface.co/EleutherAI/pythia-1b) \| 300B \| 143k \| 47.16 \|
	\| 2023-09-04 \| [TinyLlama-1.1B-intermediate-step-50k-105b](https://huggingface.co/PY007/TinyLlama-1.1B-step-50K-105b) \| 105B \| 50k \| 43.50 \|
	\| 2023-09-16 \| -- \| 500B \| -- \| -- \|
	\| 2023-10-01 \| -- \| 1T \| -- \| -- \|
	\| 2023-10-16 \| -- \| 1.5T \| -- \| -- \|
	\| 2023-10-31 \| -- \| 2T \| -- \| -- \|
	\| 2023-11-15 \| -- \| 2.5T \| -- \| -- \|
	\| 2023-12-01 \| -- \| 3T \| -- \| -- \|

	<!-- \| Baseline \| [Pythia-1B-intermediate-52b](https://huggingface.co/EleutherAI/pythia-1b/tree/step25000) \| 52B \| 25k \| 38.81 \| -->
	<!-- \| Baseline \| [Pythia-1.4B-intermediate-52b](https://huggingface.co/EleutherAI/pythia-1.4b/tree/step25000) \| 52B \| 25k \| 42.49 \| -->
	<!-- \| Baseline \| [Pythia-1.4B-intermediate-105b](https://huggingface.co/EleutherAI/pythia-1.4b/tree/step50000) \| 105B \| 50k \| 46.14 \| -->
	<!-- \| 2023-09-04 \| [TinyLlama-1.1B-intermediate-52b](https://huggingface.co/PY007/TinyLlama-1.1B-52b) \| 52B \| 25k \| 40.85 \|
	\| 2023-09-04 \| [TinyLlama-1.1B-intermediate-84b](https://huggingface.co/PY007/TinyLlama-1.1B-84b) \| 84B \| 40k \| 42.65 \| -->

	It can be observed that TinyLlama has so far progressed well 🎉🎉.

	Meanwhile, you can track the live cross entropy loss [here](https://wandb.ai/lance777/lightning_logs/reports/metric-train_loss-23-09-02-15-26-17---Vmlldzo1MjkzNzMw?accessToken=9843chbl7rfi1w03hxttpcnbo9z8t6088pw3ddn4h8teunaq0cy7j8hw9c5i02ve).

	## Training Details
	Below are some details of our training setup:

	\| Setting \| Description \|
	\|---------------------------------\|----------------------------------------------------------------\|
	\| Parameters \| 1.1B \|
	\| Attention Variant \| Grouped Query Attention \|
	\| Model Size \| Layers: 22, Heads: 32, Query Groups: 4, Embedding Size: 2048, Intermediate Size (Swiglu): 5632\|
	\| Sequence Length \| 2048 \|
	\| Batch Size \| 2 million tokens (2048 * 1024) \|
	\| Learning Rate \| 4e-4 \|
	\| Learning Rate Schedule \| Cosine with 2000 warmup steps \|
	\| Training Data \| [Slimpajama](https://huggingface.co/datasets/cerebras/slimpajama-627b) & [Starcoderdata](https://huggingface.co/datasets/bigcode/starcoderdata) \|
	\| Data Preprocessing \| Excluded GitHub subset of Slimpajama; Sampled all code from Starcoderdata \|
	\| Combined Dataset Size \| 1 trillion tokens \|
	\| Total Tokens During Training \| 3 trillion (3 epochs/1430k steps) \|
	\| Natural Language to Code Ratio \| 7:3 \|
	\| Hardware \| 16 A100-40G GPUs \|

	---
	license: apache-2.0
	datasets:
	- cerebras/SlimPajama-627B
	- bigcode/starcoderdata
	language:
	- en
	---
	<div align="center">

	# TinyLlama-1.1B
	</div>

	https://github.com/jzhang38/TinyLlama

	The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01.

	<div align="center">
	<img src="./TinyLlama_logo.png" width="300"/>
	</div>

	We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.


	#### Releases Schedule
	We will be rolling out intermediate checkpoints following the below schedule. We also include some baseline models for comparison.

	\| Date \| HF Checkpoint \| Tokens \| Step \| HellaSwag Acc_norm \|
	\|------------\|-------------------------------------------------\|--------\|------\|---------------------\|
	\| Baseline \| [StableLM-Alpha-3B](https://huggingface.co/stabilityai/stablelm-base-alpha-3b)\| 800B \| -- \| 38.31 \|
	\| Baseline \| [Pythia-1B-intermediate-step-50k-105b](https://huggingface.co/EleutherAI/pythia-1b/tree/step50000) \| 105B \| 50k \| 42.04 \|
	\| Baseline \| [Pythia-1B](https://huggingface.co/EleutherAI/pythia-1b) \| 300B \| 143k \| 47.16 \|
	\| 2023-09-04 \| [TinyLlama-1.1B-intermediate-step-50k-105b](https://huggingface.co/PY007/TinyLlama-1.1B-step-50K-105b) \| 105B \| 50k \| 43.50 \|
	\| 2023-09-16 \| -- \| 500B \| -- \| -- \|
	\| 2023-10-01 \| -- \| 1T \| -- \| -- \|
	\| 2023-10-16 \| -- \| 1.5T \| -- \| -- \|
	\| 2023-10-31 \| -- \| 2T \| -- \| -- \|
	\| 2023-11-15 \| -- \| 2.5T \| -- \| -- \|
	\| 2023-12-01 \| -- \| 3T \| -- \| -- \|

	<!-- \| Baseline \| [Pythia-1B-intermediate-52b](https://huggingface.co/EleutherAI/pythia-1b/tree/step25000) \| 52B \| 25k \| 38.81 \| -->
	<!-- \| Baseline \| [Pythia-1.4B-intermediate-52b](https://huggingface.co/EleutherAI/pythia-1.4b/tree/step25000) \| 52B \| 25k \| 42.49 \| -->
	<!-- \| Baseline \| [Pythia-1.4B-intermediate-105b](https://huggingface.co/EleutherAI/pythia-1.4b/tree/step50000) \| 105B \| 50k \| 46.14 \| -->
	<!-- \| 2023-09-04 \| [TinyLlama-1.1B-intermediate-52b](https://huggingface.co/PY007/TinyLlama-1.1B-52b) \| 52B \| 25k \| 40.85 \|
	\| 2023-09-04 \| [TinyLlama-1.1B-intermediate-84b](https://huggingface.co/PY007/TinyLlama-1.1B-84b) \| 84B \| 40k \| 42.65 \| -->

	It can be observed that TinyLlama has so far progressed well 🎉🎉.

	Meanwhile, you can track the live cross entropy loss [here](https://wandb.ai/lance777/lightning_logs/reports/metric-train_loss-23-09-02-15-26-17---Vmlldzo1MjkzNzMw?accessToken=9843chbl7rfi1w03hxttpcnbo9z8t6088pw3ddn4h8teunaq0cy7j8hw9c5i02ve).

	## Training Details
	Below are some details of our training setup:

	\| Setting \| Description \|
	\|---------------------------------\|----------------------------------------------------------------\|
	\| Parameters \| 1.1B \|
	\| Attention Variant \| Grouped Query Attention \|
	\| Model Size \| Layers: 22, Heads: 32, Query Groups: 4, Embedding Size: 2048, Intermediate Size (Swiglu): 5632\|
	\| Sequence Length \| 2048 \|
	\| Batch Size \| 2 million tokens (2048 * 1024) \|
	\| Learning Rate \| 4e-4 \|
	\| Learning Rate Schedule \| Cosine with 2000 warmup steps \|
	\| Training Data \| [Slimpajama](https://huggingface.co/datasets/cerebras/slimpajama-627b) & [Starcoderdata](https://huggingface.co/datasets/bigcode/starcoderdata) \|
	\| Data Preprocessing \| Excluded GitHub subset of Slimpajama; Sampled all code from Starcoderdata \|
	\| Combined Dataset Size \| 1 trillion tokens \|
	\| Total Tokens During Training \| 3 trillion (3 epochs/1430k steps) \|
	\| Natural Language to Code Ratio \| 7:3 \|
	\| Hardware \| 16 A100-40G GPUs \|