File size: 2,043 Bytes
b9626f8 88d4132 6cdfe69 7fbbf54 b9626f8 7fbbf54 88d4132 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
license: apache-2.0
tags:
- jamba
datasets:
- teknium/OpenHermes-2.5
pipeline_tag: text-generation
---
# This is highly experimental and should be viewed as purely testing right now. Jamba has been very hard to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here
---
## Training
### Open-Hermes-2.0 (Only first 1500 examples): **[ 1530/125193 4:46:45 < 386:48:08, 0.09 it/s, Epoch 0.01/1]**
```py
from trl import SFTTrainer
import torch
from peft import LoraConfig
from transformers import AutoTokenizer, TrainingArguments
from transformers import BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
# Initialize or load your tokenizer and model here
tokenizer = AutoTokenizer.from_pretrained("ai21labs/Jamba-v0.1")
tokenizer.padding_side = 'right'
tokenizer.padding_side = 'left'
max_seq_length = 4096
lora_config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["embed_tokens", "x_proj", "in_proj", "out_proj"],
lora_dropout=0.2,
task_type="CAUSAL_LM",
bias="none"
)
trainer = SFTTrainer(
model=model,
train_dataset=train_dataset,
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
args=TrainingArguments(
num_train_epochs=1,
lr_scheduler_type='linear',
learning_rate=2e-5,
per_device_train_batch_size=1,
gradient_accumulation_steps=8,
gradient_checkpointing=True,
warmup_steps=10,
weight_decay=0.2,
fp16=not torch.cuda.is_bf16_supported(),
bf16=torch.cuda.is_bf16_supported(),
logging_steps=1,
save_steps=100,
output_dir="outputs",
optim="paged_adamw_8bit",
seed=42,
),
)
# Set environment variables for PyTorch memory management
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128,expandable_segments:True"
```
|