File size: 2,614 Bytes
b412e14
afbef88
 
b412e14
afbef88
 
b412e14
 
afbef88
b412e14
afbef88
b412e14
 
afbef88
b412e14
afbef88
 
b412e14
 
afbef88
b412e14
afbef88
b412e14
afbef88
b412e14
afbef88
b412e14
afbef88
b412e14
afbef88
 
 
 
 
b412e14
afbef88
b412e14
afbef88
 
 
 
 
 
 
 
 
 
b412e14
 
 
 
 
afbef88
 
b412e14
afbef88
b412e14
afbef88
 
 
 
 
b412e14
 
 
 
 
afbef88
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
license: mit
library_name: "trl"
tags:
- KTO
- WeniGPT
base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
model-index:
- name: Weni/WeniGPT-QA-Zephyr-7B-5.0.1-KTO
  results: []
language: ['pt']
---

# Weni/WeniGPT-QA-Zephyr-7B-5.0.1-KTO

This model is a fine-tuned version of [mistralai/Mixtral-8x7B-Instruct-v0.1] on the dataset Weni/WeniGPT-QA-Binarized-1.2.0 with the KTO trainer. It is part of the WeniGPT project for [Weni](https://weni.ai/).
Description: WeniGPT Experiment using KTO trainer with no collator, Mixstral model and no system prompt.

It achieves the following results on the evaluation set:
{'eval_loss': 0.014605735428631306, 'eval_runtime': 1025.937, 'eval_samples_per_second': 0.476, 'eval_steps_per_second': 0.119, 'eval/rewards/chosen': 6.546164512634277, 'eval/rewards/rejected': -30.777591705322266, 'eval/kl': 0.25049710273742676, 'eval/logps/chosen': -129.4441375732422, 'eval/logps/rejected': -508.0271301269531, 'eval/rewards/margins': 37.32375621795654, 'epoch': 1.99}

## Intended uses & limitations

This model has not been trained to avoid specific intructions. 

## Training procedure

Finetuning was done on the model mistralai/Mixtral-8x7B-Instruct-v0.1 with the following prompt:

```
---------------------
Question:
<|user|>
Contexto: {context}

Questão: {question}</s>


---------------------
Response:
<|assistant|>
{response}</s>


---------------------

```

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- per_device_train_batch_size: 4
- per_device_eval_batch_size: 4
- gradient_accumulation_steps: 8
- num_gpus: 1
- total_train_batch_size: 32
- optimizer: AdamW
- lr_scheduler_type: cosine
- num_steps: 262
- quantization_type: bitsandbytes
- LoRA: ("\n  - bits: 4\n  - use_exllama: True\n  - device_map: auto\n  - use_cache: False\n  - lora_r: 16\n  - lora_alpha: 32\n  - lora_dropout: 0.05\n  - bias: none\n  - target_modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj']\n  - task_type: CAUSAL_LM",)

### Training results

### Framework versions

- transformers==4.39.1
- datasets==2.18.0
- peft==0.10.0
- safetensors==0.4.2
- evaluate==0.4.1
- bitsandbytes==0.43
- huggingface_hub==0.20.3
- seqeval==1.2.2
- optimum==1.17.1
- auto-gptq==0.7.1
- gpustat==1.1.1
- deepspeed==0.14.0
- wandb==0.16.3
- # trl==0.8.1
- git+https://github.com/claralp/trl.git@fix_nans#egg=trl
- accelerate==0.28.0
- coloredlogs==15.0.1
- traitlets==5.14.1
- autoawq@https://github.com/casper-hansen/AutoAWQ/releases/download/v0.2.0/autoawq-0.2.0+cu118-cp310-cp310-linux_x86_64.whl

### Hardware
- Cloud provided: runpod.io