Mel-Iza0 commited on
Commit
1d1c49b
1 Parent(s): 192315b

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +52 -40
README.md CHANGED
@@ -1,69 +1,81 @@
1
  ---
2
- library_name: peft
 
3
  tags:
4
- - trl
5
- - dpo
6
- - generated_from_trainer
7
  base_model: Weni/WeniGPT-2.2.3-Zephyr-7B-merged-LLM_Base_2.0.3_SFT
8
  model-index:
9
- - name: WeniGPT-2.6.3-Zephyr-7B-zephyr-prompt-LLM_Base_2.0.3_DPO_reduction_variation
10
  results: []
 
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
 
16
- # WeniGPT-2.6.3-Zephyr-7B-zephyr-prompt-LLM_Base_2.0.3_DPO_reduction_variation
17
 
18
- This model is a fine-tuned version of [Weni/WeniGPT-2.2.3-Zephyr-7B-merged-LLM_Base_2.0.3_SFT](https://huggingface.co/Weni/WeniGPT-2.2.3-Zephyr-7B-merged-LLM_Base_2.0.3_SFT) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 0.6931
21
- - Rewards/chosen: 0.0
22
- - Rewards/rejected: 0.0
23
- - Rewards/accuracies: 0.0
24
- - Rewards/margins: 0.0
25
- - Logps/rejected: -206.1858
26
- - Logps/chosen: -64.0427
27
- - Logits/rejected: -2.0290
28
- - Logits/chosen: -1.6491
29
 
30
- ## Model description
31
 
32
- More information needed
33
 
34
- ## Intended uses & limitations
35
 
36
- More information needed
37
 
38
- ## Training and evaluation data
 
 
39
 
40
- More information needed
41
 
42
- ## Training procedure
 
 
 
 
 
 
43
 
44
  ### Training hyperparameters
45
 
46
  The following hyperparameters were used during training:
47
  - learning_rate: 0.0002
48
- - train_batch_size: 8
49
- - eval_batch_size: 2
50
- - seed: 42
51
  - gradient_accumulation_steps: 2
 
52
  - total_train_batch_size: 16
53
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
- - lr_scheduler_type: linear
55
- - lr_scheduler_warmup_ratio: 0.1
56
- - training_steps: 1
57
- - mixed_precision_training: Native AMP
58
 
59
  ### Training results
60
 
61
-
62
-
63
  ### Framework versions
64
 
65
- - PEFT 0.8.2
66
- - Transformers 4.39.0.dev0
67
- - Pytorch 2.1.0+cu118
68
- - Datasets 2.17.1
69
- - Tokenizers 0.15.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: mit
3
+ library_name: "trl"
4
  tags:
5
+ - DPO
6
+ - DPO
 
7
  base_model: Weni/WeniGPT-2.2.3-Zephyr-7B-merged-LLM_Base_2.0.3_SFT
8
  model-index:
9
+ - name: Weni/WeniGPT-2.6.3-Zephyr-7B-zephyr-prompt-LLM_Base_2.0.3_DPO_reduction_variation
10
  results: []
11
+ language: ['pt']
12
  ---
13
 
14
+ # Weni/WeniGPT-2.6.3-Zephyr-7B-zephyr-prompt-LLM_Base_2.0.3_DPO_reduction_variation
 
15
 
16
+ This model is a fine-tuned version of [Weni/WeniGPT-2.2.3-Zephyr-7B-merged-LLM_Base_2.0.3_SFT] on the dataset Weni/LLM_Base_2.0.3_DPO with the DPO trainer. It is part of the DPO project for [Weni](https://weni.ai/).
17
 
 
18
  It achieves the following results on the evaluation set:
19
+ {'eval_loss': 0.6931472420692444, 'eval_runtime': 175.1355, 'eval_samples_per_second': 2.804, 'eval_steps_per_second': 1.405, 'eval_rewards/chosen': 0.0, 'eval_rewards/rejected': 0.0, 'eval_rewards/accuracies': 0.0, 'eval_rewards/margins': 0.0, 'eval_logps/rejected': -206.18580627441406, 'eval_logps/chosen': -64.04271697998047, 'eval_logits/rejected': -2.028987169265747, 'eval_logits/chosen': -1.6491303443908691, 'epoch': 0.0}
 
 
 
 
 
 
 
 
20
 
21
+ ## Intended uses & limitations
22
 
23
+ This model has not been trained to avoid specific intructions.
24
 
25
+ ## Training procedure
26
 
27
+ Finetuning was done on the model Weni/WeniGPT-2.2.3-Zephyr-7B-merged-LLM_Base_2.0.3_SFT with the following prompt:
28
 
29
+ ```
30
+ Question:
31
+ <|user|>{question}</s>
32
 
 
33
 
34
+ Chosen:
35
+ <|assistant|>{correct_ans}</s>
36
+
37
+
38
+ Rejected:
39
+ <|assistant|>{rejected_ans}</s>
40
+ ```
41
 
42
  ### Training hyperparameters
43
 
44
  The following hyperparameters were used during training:
45
  - learning_rate: 0.0002
46
+ - per_device_train_batch_size: 8
47
+ - per_device_eval_batch_size: 2
 
48
  - gradient_accumulation_steps: 2
49
+ - num_gpus: 1
50
  - total_train_batch_size: 16
51
+ - optimizer: AdamW
52
+ - lr_scheduler_type: cosine
53
+ - num_steps: 1
54
+ - quantization_type: bitsandbytes
55
+ - LoRA: ("\n - bits: 4\n - use_exllama: True\n - device_map: auto\n - use_cache: False\n - lora_r: 8\n - lora_alpha: 16\n - lora_dropout: 0.1\n - bias: none\n - target_modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj']\n - task_type: CAUSAL_LM",)
56
 
57
  ### Training results
58
 
 
 
59
  ### Framework versions
60
 
61
+ - git+https://github.com/huggingface/transformers@main
62
+ - datasets==2.17.1
63
+ - peft==0.8.2
64
+ - safetensors==0.4.2
65
+ - evaluate==0.4.1
66
+ - bitsandbytes==0.42
67
+ - huggingface_hub==0.20.3
68
+ - seqeval==1.2.2
69
+ - optimum==1.17.1
70
+ - auto-gptq==0.7.0
71
+ - gpustat==1.1.1
72
+ - deepspeed==0.13.2
73
+ - wandb==0.16.3
74
+ - git+https://github.com/huggingface/trl.git@main
75
+ - git+https://github.com/huggingface/accelerate.git@main
76
+ - coloredlogs==15.0.1
77
+ - traitlets==5.14.1
78
+ - autoawq@https://github.com/casper-hansen/AutoAWQ/releases/download/v0.2.0/autoawq-0.2.0+cu118-cp310-cp310-linux_x86_64.whl
79
+
80
+ ### Hardware
81
+ - Cloud provided: runpod.io