--- license: llama2 --- ## Introducing GenZ Infinite The model is a finetuned version of Genz-13B-v2 with a context size of 16K. The model architecture is updated to have lamda attention from the LM-Infinite paper which gives the model capability of 120K+ sequence length without affecting the preplexity ## Generate responses Use the generate.py file from the [github repo](https://github.com/BudEcosystem/genz-infinite) ``` python generate.py --base_model budecosystem/genz-13b-infinite ``` You can integrate the model in your code my loading convert_llama_model function. ```python import torch from transformers import GenerationConfig, AutoModelForCausalLM, AutoTokenizer from model.llama import convert_llama_model local_branch = 2048 global_branch = 10 limit_distance = 2048 model = AutoModelForCausalLM.from_pretrained( "budecosystem/genz-13b-infinite", torch_dtype=torch.float16, device_map="auto", ) model = convert_llama_model(model, local_branch, global_branch) ``` ## Evaluation | Task | 4096 | 5120 | 8192 | 16384 | | :----:|:---------:| :--------:| :--------:| :--------:| |Passkey retreival | 100 | 75 | 48 | 30 | ## Training details The model is trained of 4 A100 80GB for approximately 55hrs. | Hyperparameters | Value | | :----------------------------| :-----: | | per_device_train_batch_size | 1 | | gradient_accumulation_steps | 1 | | epoch | 3 | | steps | 8550 | | learning_rate | 2e-4 | | lr schedular type | cosine | | warmup steps | 1000 | | optimizer | adamw | | fp16 | True | | GPU | 4 A100 80GB | ### Acknowledgments We'd like to thank the open-source community and the researchers whose foundational work laid the path to this model. Special shoutout to the authors of [LM-Infinite paper](https://arxiv.org/abs/2308.16137) and the [GitHub repo](https://github.com/Glaciohound/LM-Infinite)