budecosystem
/

genz-13b-infinite

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

genz-13b-infinite / README.md

dittops's picture

Update README.md

81f5e49 about 1 year ago

|

history blame contribute delete

1.99 kB

	---
	license: llama2
	---

	## Introducing GenZ Infinite

	The model is a finetuned version of Genz-13B-v2 with a context size of 16K. The model architecture is updated to have lamda attention from the LM-Infinite paper which gives the model capability of 120K+ sequence length without affecting the preplexity

	## Generate responses

	Use the generate.py file from the [github repo](https://github.com/BudEcosystem/genz-infinite)

	```
	python generate.py --base_model budecosystem/genz-13b-infinite

	```

	You can integrate the model in your code my loading convert_llama_model function.

	```python
	import torch
	from transformers import GenerationConfig, AutoModelForCausalLM, AutoTokenizer
	from model.llama import convert_llama_model

	local_branch = 2048
	global_branch = 10
	limit_distance = 2048

	model = AutoModelForCausalLM.from_pretrained(
	"budecosystem/genz-13b-infinite",
	torch_dtype=torch.float16,
	device_map="auto",
	)
	model = convert_llama_model(model, local_branch, global_branch)

	```

	## Evaluation


	\| Task \| 4096 \| 5120 \| 8192 \| 16384 \|
	\| :----:\|:---------:\| :--------:\| :--------:\| :--------:\|
	\|Passkey retreival \| 100 \| 75 \| 48 \| 30 \|


	## Training details

	The model is trained of 4 A100 80GB for approximately 55hrs.

	\| Hyperparameters \| Value \|
	\| :----------------------------\| :-----: \|
	\| per_device_train_batch_size \| 1 \|
	\| gradient_accumulation_steps \| 1 \|
	\| epoch \| 3 \|
	\| steps \| 8550 \|
	\| learning_rate \| 2e-4 \|
	\| lr schedular type \| cosine \|
	\| warmup steps \| 1000 \|
	\| optimizer \| adamw \|
	\| fp16 \| True \|
	\| GPU \| 4 A100 80GB \|


	### Acknowledgments

	We'd like to thank the open-source community and the researchers whose foundational work laid the path to this model. Special shoutout to the authors of [LM-Infinite paper](https://arxiv.org/abs/2308.16137) and the [GitHub repo](https://github.com/Glaciohound/LM-Infinite)