beomi (Lee Junbum)

Posts 2

Post

2332

#TPU #PyTorch #Jax

When You're trying to use PyTorch or Jax on TPU,

for v2/v3/v4:
use tpu-ubuntu2204-base

for v5p:
use v2-alpha-tpuv5

for v5e:
use v2-alpha-tpuv5-lite

You must use these base images for the system to 'boot'.

Previously used tpu-vm-v4-pt-1.13 images might seem to start the VM, but SSH connections do not work.

I thought it was a firewall issue and spent a lot of time on it before realizing it was a problem with the boot image 🥲

https://cloud.google.com/tpu/docs/runtimes#pytorch_and_jax

Post

3025

🚀 **InfiniTransformer, Gemma/Llama3 based Implementation!** 🌌

> Update @ 2024.04.19: It now supports Llama-3!

> Note: this implementation is unofficial

This implementation is designed to handle virtually infinite context lengths.

Here's the github repo: https://github.com/Beomi/InfiniTransformer

📄 **Read the original Paper:** https://arxiv.org/abs/2404.07143

## **Focus on Infini-Attention**

- **2 Types of Implementation available:** Attention-layer only implementation / Model & Train-wise implementation
- **Fixed(segment dependent) Memory Usage:** Enables training on larger models and longer sequences without the memory overhead typical of standard Transformer implementations.
- **Infinite Context Capability:** Train with unprecedented sequence lengths—imagine handling up to 1 million sequence lengths on standard hardware!
- You could train Gemma-2B with 1M sequence length with 2K segmentation size with single H100 GPU.

## **Try InfiniTransformer**

1. **Clone the repository:**

bash
   git clone https://github.com/Beomi/InfiniTransformer

2. **Install necessary tools:**

bash
   pip install -r requirements.txt
   pip install -e git+https://github.com/huggingface/transformers.git@b109257f4f#egg=transformers

3. **Dive Deep into Custom Training:**
- Train with extensive sequence lengths using scripts such as ./train.gemma.infini.noclm.1Mseq.sh.

for more detailed info, please visit Repo: https://github.com/Beomi/InfiniTransformer

Look forward to see your feedbacks! 😊

ps. Training loss plot is here 😉

Collections 2

models 73

Lee Junbum

AI & ML interests

Organizations

Posts 2

Collections 2

beomi/gemma-mling-7b

beomi/gemma-ko-7b

beomi/gemma-ko-2b

beomi/OPEN-SOLAR-KO-10.7B

beomi/KoRWKV-1.5B

beomi/KoRWKV-6B

models 73

beomi/Llama-3-Open-Ko-8B-Instruct-preview

beomi/Llama-3-Open-Ko-8B

beomi/Llama-3-Infini-1M

beomi/gemma-mling-7b

beomi/Yi-Ko-34B-dev

beomi/gemma-ko-2b

beomi/gemma-ko-7b

beomi/Yi-Ko-6B

beomi/Mistral-Ko-Inst-dev

beomi/OPEN-SOLAR-KO-10.7B

datasets 1

beomi/KoAlpaca-v1.1a

Lee Junbum

AI & ML interests

Organizations

Posts 2

Collections 2

models 73 Sort: Recently updated

datasets 1

models 73