beomi (Lee Junbum)

Posts 1

Post

🚀 **InfiniTransformer, Gemma/Llama3 based Implementation!** 🌌

> Update @ 2024.04.19: It now supports Llama-3!

> Note: this implementation is unofficial

This implementation is designed to handle virtually infinite context lengths.

Here's the github repo: https://github.com/Beomi/InfiniTransformer

📄 **Read the original Paper:** https://arxiv.org/abs/2404.07143

## **Focus on Infini-Attention**

- **2 Types of Implementation available:** Attention-layer only implementation / Model & Train-wise implementation
- **Fixed(segment dependent) Memory Usage:** Enables training on larger models and longer sequences without the memory overhead typical of standard Transformer implementations.
- **Infinite Context Capability:** Train with unprecedented sequence lengths—imagine handling up to 1 million sequence lengths on standard hardware!
- You could train Gemma-2B with 1M sequence length with 2K segmentation size with single H100 GPU.

## **Try InfiniTransformer**

1. **Clone the repository:**

bash
   git clone https://github.com/Beomi/InfiniTransformer

2. **Install necessary tools:**

bash
   pip install -r requirements.txt
   pip install -e git+https://github.com/huggingface/transformers.git@b109257f4f#egg=transformers

3. **Dive Deep into Custom Training:**
- Train with extensive sequence lengths using scripts such as ./train.gemma.infini.noclm.1Mseq.sh.

for more detailed info, please visit Repo: https://github.com/Beomi/InfiniTransformer

Look forward to see your feedbacks! 😊

ps. Training loss plot is here 😉

Collections 2

models 70

Lee Junbum

AI & ML interests

Organizations

Posts 1

Collections 2

beomi/gemma-mling-7b

beomi/gemma-ko-7b

beomi/gemma-ko-2b

beomi/OPEN-SOLAR-KO-10.7B

beomi/KoRWKV-1.5B

beomi/KoRWKV-6B

models 70

beomi/gemma-mling-7b

beomi/Yi-Ko-34B-dev

beomi/gemma-ko-2b

beomi/gemma-ko-7b

beomi/Yi-Ko-6B

beomi/Mistral-Ko-Inst-dev

beomi/OPEN-SOLAR-KO-10.7B

beomi/SOLAR-KOEN-10.8B

beomi/Yi-Ko-DUS-9B

beomi/Llama-2-KoEn-13B-v2

datasets 1

beomi/KoAlpaca-v1.1a

Lee Junbum

AI & ML interests

Organizations

Posts 1

Collections 2

models 70 Sort: Recently updated

datasets 1

models 70