minGRU
Collection
Hugging Face implementation of minGRU RNN models
•
2 items
•
Updated
•
3
First Hugging Face integration of minGRU models from the paper "Were RNNs All We Needed?".
This model uses GPT-2 tokenizer and trained on roneneldan/TinyStories dataset.
Note: This is an experimental model. Don't forget to train model before usage!
Make sure you have installed "minGRU-pytorch" library by running "pip install minGRU-pytorch".
For modeling and configuration codes: minGRU-hf
Training code:
def train_model(model, tokenizer, train_data, output_dir, epochs=3, batch_size=16, learning_rate=5e-5, block_size=128):
train_dataset = TinyStoriesDataset(train_data, tokenizer, block_size)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)
scheduler = get_scheduler("linear", optimizer=optimizer, num_warmup_steps=0, num_training_steps=len(train_loader) * epochs)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.train()
for epoch in range(epochs):
print(f"Epoch {epoch + 1}/{epochs}")
epoch_loss = 0
progress_bar = tqdm(train_loader, desc="Training")
for batch in progress_bar:
batch = batch.to(device)
outputs = model(batch, labels=batch)
loss = outputs.loss
optimizer.zero_grad()
loss.backward()
optimizer.step()
scheduler.step()
epoch_loss += loss.item()
progress_bar.set_postfix(loss=loss.item())
print(f"Epoch {epoch + 1} Loss: {epoch_loss / len(train_loader)}")
model.save_pretrained(output_dir, safe_serialization = False)
tokenizer.save_pretrained(output_dir)
You can use this code snippet for fine-tuning!
https://arxiv.org/abs/2410.01201
I am thankful to Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio and Hossein Hajimirsadeghi for their papers.