--- license: apache-2.0 tags: - jamba datasets: - teknium/OpenHermes-2.5 base_model: ai21labs/Jamba-v0.1 pipeline_tag: text-generation --- # Jamba-Open-Hermes # This is highly experimental and should be viewed as purely testing right now. Jamba has been very hard to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here *There's been limited testing so example outputs yet* --- ## Training ### Open-Hermes-2.0 (Only first 1500 examples): **[ 1530/125193 4:46:45 < 386:48:08, 0.09 it/s, Epoch 0.01/1]** ``` 1483 5.986700 1484 5.764100 1485 5.887200 1486 5.445200 1487 6.086300 1488 5.718300 1489 5.670300 1490 5.440900 1491 4.945900 1492 6.154700 1493 5.624800 1494 6.868100 1495 5.627100 1496 5.192700 1497 5.826800 1498 5.512200 1499 5.869900 1500 5.852300 1501 5.574800 1502 5.299200 1503 5.631200 1504 5.535600 1505 5.626000 1506 5.093300 1507 5.278000 1508 5.585400 1509 5.318600 1510 5.319200 1511 5.513900 1512 5.375400 1513 5.460600 1514 5.045300 1515 6.013600 1516 5.812300 1517 5.707400 1518 5.109800 1519 5.212900 1520 5.317200 1521 5.935400 1522 5.733900 1523 5.866000 1524 5.675400 1525 5.580800 1526 4.996900 1527 5.666700 1528 4.979900 ``` ### Hyperparameters ```py from trl import SFTTrainer import torch from peft import LoraConfig from transformers import AutoTokenizer, TrainingArguments from transformers import BitsAndBytesConfig from transformers import AutoModelForCausalLM, AutoTokenizer # Initialize or load your tokenizer and model here tokenizer = AutoTokenizer.from_pretrained("ai21labs/Jamba-v0.1") tokenizer.padding_side = 'right' tokenizer.padding_side = 'left' max_seq_length = 4096 lora_config = LoraConfig( r=8, lora_alpha=16, target_modules=["embed_tokens", "x_proj", "in_proj", "out_proj"], lora_dropout=0.2, task_type="CAUSAL_LM", bias="none" ) trainer = SFTTrainer( model=model, train_dataset=train_dataset, dataset_text_field="text", max_seq_length=max_seq_length, tokenizer=tokenizer, args=TrainingArguments( num_train_epochs=1, lr_scheduler_type='linear', learning_rate=2e-5, per_device_train_batch_size=1, gradient_accumulation_steps=8, gradient_checkpointing=True, warmup_steps=10, weight_decay=0.2, fp16=not torch.cuda.is_bf16_supported(), bf16=torch.cuda.is_bf16_supported(), logging_steps=1, save_steps=100, output_dir="outputs", optim="paged_adamw_8bit", seed=42, ), ) # Set environment variables for PyTorch memory management import os os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128,expandable_segments:True" ```