Severian commited on
Commit
88d4132
1 Parent(s): b60c78a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -2
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- license: mit
3
  tags:
4
  - jamba
5
  datasets:
@@ -7,4 +7,65 @@ datasets:
7
  pipeline_tag: text-generation
8
  ---
9
 
10
- # PLACEHOLDER - Currently training. This is highly experimental and should be viewed as purely testing right now. Jamba has been very hard to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  tags:
4
  - jamba
5
  datasets:
 
7
  pipeline_tag: text-generation
8
  ---
9
 
10
+ # This is highly experimental and should be viewed as purely testing right now. Jamba has been very hard to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here
11
+
12
+ ---
13
+ ## Training
14
+
15
+
16
+ ### Open-Hermes-2.0 (Only first 1500 examples): **[ 1530/125193 4:46:45 < 386:48:08, 0.09 it/s, Epoch 0.01/1]**
17
+
18
+
19
+ ```py
20
+ from trl import SFTTrainer
21
+ import torch
22
+ from peft import LoraConfig
23
+ from transformers import AutoTokenizer, TrainingArguments
24
+ from transformers import BitsAndBytesConfig
25
+ from transformers import AutoModelForCausalLM, AutoTokenizer
26
+
27
+ # Initialize or load your tokenizer and model here
28
+ tokenizer = AutoTokenizer.from_pretrained("ai21labs/Jamba-v0.1")
29
+ tokenizer.padding_side = 'right'
30
+ tokenizer.padding_side = 'left'
31
+
32
+ max_seq_length = 4096
33
+
34
+ lora_config = LoraConfig(
35
+ r=8,
36
+ lora_alpha=16,
37
+ target_modules=["embed_tokens", "x_proj", "in_proj", "out_proj"],
38
+ lora_dropout=0.2,
39
+ task_type="CAUSAL_LM",
40
+ bias="none"
41
+ )
42
+
43
+ trainer = SFTTrainer(
44
+ model=model,
45
+ train_dataset=train_dataset,
46
+ dataset_text_field="text",
47
+ max_seq_length=max_seq_length,
48
+ tokenizer=tokenizer,
49
+ args=TrainingArguments(
50
+ num_train_epochs=1,
51
+ lr_scheduler_type='linear',
52
+ learning_rate=2e-5,
53
+ per_device_train_batch_size=1,
54
+ gradient_accumulation_steps=8,
55
+ gradient_checkpointing=True,
56
+ warmup_steps=10,
57
+ weight_decay=0.2,
58
+ fp16=not torch.cuda.is_bf16_supported(),
59
+ bf16=torch.cuda.is_bf16_supported(),
60
+ logging_steps=1,
61
+ save_steps=100,
62
+ output_dir="outputs",
63
+ optim="paged_adamw_8bit",
64
+ seed=42,
65
+ ),
66
+ )
67
+
68
+ # Set environment variables for PyTorch memory management
69
+ import os
70
+ os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128,expandable_segments:True"
71
+ ```