aashish1904 commited on
Commit
6308045
1 Parent(s): 40108a5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ license: gemma
5
+ base_model: HuggingFaceH4/zephyr-7b-gemma-sft-v0.1
6
+ tags:
7
+ - alignment-handbook
8
+ - generated_from_trainer
9
+ datasets:
10
+ - argilla/dpo-mix-7k
11
+ model-index:
12
+ - name: DiscoPOP-zephyr-7b-gemma
13
+ results: []
14
+
15
+ ---
16
+
17
+ ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)
18
+
19
+ # QuantFactory/DiscoPOP-zephyr-7b-gemma-GGUF
20
+ This is quantized version of [SakanaAI/DiscoPOP-zephyr-7b-gemma](https://huggingface.co/SakanaAI/DiscoPOP-zephyr-7b-gemma) created using llama.cpp
21
+
22
+ # Original Model Card
23
+
24
+
25
+ # DiscoPOP-zephyr-7b-gemma
26
+
27
+ This model is a fine-tuned version of [HuggingFaceH4/zephyr-7b-gemma-sft-v0.1](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-sft-v0.1) on the argilla/dpo-mix-7k dataset.
28
+
29
+ This model is from the paper ["Discovering Preference Optimization Algorithms with and for Large Language Models"](https://arxiv.org/abs/2406.08414)
30
+
31
+ Read the [blog post on it here!](https://sakana.ai/llm-squared)
32
+
33
+ See the codebase to generate it here: [https://github.com/SakanaAI/DiscoPOP](https://github.com/SakanaAI/DiscoPOP)
34
+
35
+ ## Model description
36
+
37
+ This model is identical in training to [HuggingFaceH4/zephyr-7b-gemma-v0.1](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1), except instead of using Direct Preference Optimization (DPO), it uses DiscoPOP.
38
+
39
+ DiscoPOP is our Discovered Preference Optimization algorithm, which is defined as follows:
40
+
41
+ ```
42
+ def log_ratio_modulated_loss(
43
+ self,
44
+ policy_chosen_logps: torch.FloatTensor,
45
+ policy_rejected_logps: torch.FloatTensor,
46
+ reference_chosen_logps: torch.FloatTensor,
47
+ reference_rejected_logps: torch.FloatTensor,
48
+ ) -> torch.FloatTensor:
49
+ pi_logratios = policy_chosen_logps - policy_rejected_logps
50
+ ref_logratios = reference_chosen_logps - reference_rejected_logps
51
+ logits = pi_logratios - ref_logratios
52
+ # Modulate the mixing coefficient based on the log ratio magnitudes
53
+ log_ratio_modulation = torch.sigmoid(logits)
54
+ logistic_component = -F.logsigmoid(self.beta * logits)
55
+ exp_component = torch.exp(-self.beta * logits)
56
+ # Blend between logistic and exponential component based on log ratio modulation
57
+ losses = logistic_component * (1 - log_ratio_modulation) + exp_component * log_ratio_modulation
58
+ return losses
59
+ ```
60
+
61
+
62
+ ### Training hyperparameters
63
+
64
+ The following hyperparameters were used during training:
65
+ - learning_rate: 5e-07
66
+ - train_batch_size: 2
67
+ - eval_batch_size: 4
68
+ - seed: 42
69
+ - distributed_type: multi-GPU
70
+ - num_devices: 8
71
+ - gradient_accumulation_steps: 8
72
+ - total_train_batch_size: 128
73
+ - total_eval_batch_size: 32
74
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
75
+ - lr_scheduler_type: cosine
76
+ - lr_scheduler_warmup_ratio: 0.1
77
+ - num_epochs: 2
78
+
79
+ ### Framework versions
80
+
81
+ - Transformers 4.40.1
82
+ - Pytorch 2.1.2+cu121
83
+ - Datasets 2.19.0
84
+ - Tokenizers 0.19.1
85
+