abacusai
/

Llama-3-Giraffe-70B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3-Giraffe-70B / README.md

ArkaAbacus's picture

Update README.md

02457f4 verified 3 months ago

|

history blame contribute delete

No virus

1.81 kB

	---
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- meta
	- llama-3
	license: llama3
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/pf4d6FA7DriRtVq5HCkxd.png)


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/VcZWbW_eZkJAZZ5ricL4B.png)

	# Llama-3-Giraffe-70B

	Abacus.AI presents our longer-necked variant of Llama 3 70B!

	This model has an effective context length of approximately 128k.

	We have currently trained on ~1B tokens.
	This is an initial release and we are hoping to improve the heatmap below further as we continue training.

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/_NVEuQ2ZT-sBtDBNjgmbt.png)

	## Training Methodology

	The methodology for training uses [PoSE](https://arxiv.org/abs/2309.10400) and dynamic-NTK interpolation.

	### NTK-scaling

	The scale factor for NTK is 4. Note that we also tried theta-scaling but this did not work as well as NTK scaling in our experiments.

	### PoSE

	We utilise Positional Skip-wise Training (PoSE) with the following parameters:

	- Number of Chunks: 5
	- Max position ID: 32768

	### Data

	We use on average ~8K long samples from [RedPajama](https://github.com/togethercomputer/RedPajama-Data).

	### Hardware

	We train on 8xH100 GPUs with Deepspeed Zero Stage 3.

	## Evaluation Methodology

	We use the [EasyContext](https://github.com/abacusai/EasyContext/blob/eval_runs/eval_needle.py) implementation of Needle-in-a-Haystack to evaluate Llama-3-Giraffe-70B.

	We evaluate with the following parameters:

	- Min context length: 2000
	- Max context length: 128000
	- Context interval: 4000
	- Depth interval: 0.1
	- Num samples: 2
	- Rnd number digits: 7
	- Haystack dir: PaulGrahamEssays