File size: 2,968 Bytes
b9626f8
88d4132
6cdfe69
 
7fbbf54
 
c5e558a
7fbbf54
b9626f8
7fbbf54
c5e558a
 
 
 
88d4132
 
3ee998f
 
88d4132
 
 
 
 
 
85d72f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88d4132
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
license: apache-2.0
tags:
- jamba
datasets:
- teknium/OpenHermes-2.5
base_model: ai21labs/Jamba-v0.1
pipeline_tag: text-generation
---

# Jamba-Open-Hermes

<img src="https://cdn-uploads.huggingface.co/production/uploads/64740cf7485a7c8e1bd51ac9/Ph6ZvxwF7a0m_B5Su_EK7.webp" width="500" height="500">

# This is highly experimental and should be viewed as purely testing right now. Jamba has been very hard to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here

*There's been limited testing so example outputs yet*

---
## Training


### Open-Hermes-2.0 (Only first 1500 examples): **[ 1530/125193 4:46:45 < 386:48:08, 0.09 it/s, Epoch 0.01/1]**

```
1483	5.986700
1484	5.764100
1485	5.887200
1486	5.445200
1487	6.086300
1488	5.718300
1489	5.670300
1490	5.440900
1491	4.945900
1492	6.154700
1493	5.624800
1494	6.868100
1495	5.627100
1496	5.192700
1497	5.826800
1498	5.512200
1499	5.869900
1500	5.852300
1501	5.574800
1502	5.299200
1503	5.631200
1504	5.535600
1505	5.626000
1506	5.093300
1507	5.278000
1508	5.585400
1509	5.318600
1510	5.319200
1511	5.513900
1512	5.375400
1513	5.460600
1514	5.045300
1515	6.013600
1516	5.812300
1517	5.707400
1518	5.109800
1519	5.212900
1520	5.317200
1521	5.935400
1522	5.733900
1523	5.866000
1524	5.675400
1525	5.580800
1526	4.996900
1527	5.666700
1528	4.979900
```

### Hyperparameters

```py
from trl import SFTTrainer
import torch
from peft import LoraConfig
from transformers import AutoTokenizer, TrainingArguments
from transformers import BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

# Initialize or load your tokenizer and model here
tokenizer = AutoTokenizer.from_pretrained("ai21labs/Jamba-v0.1")
tokenizer.padding_side = 'right'
tokenizer.padding_side  = 'left'

max_seq_length = 4096

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["embed_tokens", "x_proj", "in_proj", "out_proj"],
    lora_dropout=0.2,  
    task_type="CAUSAL_LM",
    bias="none"
)

trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=TrainingArguments(
        num_train_epochs=1,
        lr_scheduler_type='linear',
        learning_rate=2e-5,
        per_device_train_batch_size=1,
        gradient_accumulation_steps=8,
        gradient_checkpointing=True,
        warmup_steps=10,  
        weight_decay=0.2,  
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=1,  
        save_steps=100, 
        output_dir="outputs",
        optim="paged_adamw_8bit",
        seed=42,
    ),
)

# Set environment variables for PyTorch memory management
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128,expandable_segments:True"
```