decruz07 commited on
Commit
40a29de
1 Parent(s): 0963d12

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -0
README.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # Model Card for decruz07/kellemar-DPO-7B-e
6
+
7
+ <!-- Provide a quick summary of what the model is/does. -->
8
+ Learning Rate: 5e-5, steps 300
9
+ ## Model Details
10
+
11
+ Created with beta = 0.05
12
+
13
+ ### Model Description
14
+
15
+ <!-- Provide a longer summary of what this model is. -->
16
+
17
+
18
+
19
+ - **Developed by:** @decruz
20
+ - **Funded by [optional]:** my full-time job
21
+ - **Finetuned from model [optional]:** teknium/OpenHermes-2.5-Mistral-7B
22
+
23
+
24
+
25
+ ## Uses
26
+
27
+ You can use this for basic inference. You could probably finetune with this if you want to.
28
+
29
+
30
+ ## How to Get Started with the Model
31
+
32
+ You can create a space out of this, or use basic python code to call the model directly and make inferences to it.
33
+
34
+ [More Information Needed]
35
+
36
+ ## Training Details
37
+
38
+ The following was used:
39
+ `training_args = TrainingArguments(
40
+ per_device_train_batch_size=4,
41
+ gradient_accumulation_steps=4,
42
+ gradient_checkpointing=True,
43
+ learning_rate=5e-5,
44
+ lr_scheduler_type="cosine",
45
+ max_steps=200,
46
+ save_strategy="no",
47
+ logging_steps=1,
48
+ output_dir=new_model,
49
+ optim="paged_adamw_32bit",
50
+ warmup_steps=100,
51
+ bf16=True,
52
+ report_to="wandb",
53
+ )
54
+
55
+ # Create DPO trainer
56
+ dpo_trainer = DPOTrainer(
57
+ model,
58
+ ref_model,
59
+ args=training_args,
60
+ train_dataset=dataset,
61
+ tokenizer=tokenizer,
62
+ peft_config=peft_config,
63
+ beta=0.1,
64
+ max_prompt_length=1024,
65
+ max_length=1536,
66
+ )`
67
+
68
+ ### Training Data
69
+
70
+ This was trained with https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs
71
+
72
+ ### Training Procedure
73
+
74
+ Trained with Labonne's Google Colab Notebook on Finetuning Mistral 7B with DPO.
75
+
76
+ ## Model Card Authors [optional]
77
+
78
+ @decruz
79
+
80
+ ## Model Card Contact
81
+
82
+ @decruz on X/Twitter