File size: 2,614 Bytes
b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 b412e14 afbef88 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
---
license: mit
library_name: "trl"
tags:
- KTO
- WeniGPT
base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
model-index:
- name: Weni/WeniGPT-QA-Zephyr-7B-5.0.1-KTO
results: []
language: ['pt']
---
# Weni/WeniGPT-QA-Zephyr-7B-5.0.1-KTO
This model is a fine-tuned version of [mistralai/Mixtral-8x7B-Instruct-v0.1] on the dataset Weni/WeniGPT-QA-Binarized-1.2.0 with the KTO trainer. It is part of the WeniGPT project for [Weni](https://weni.ai/).
Description: WeniGPT Experiment using KTO trainer with no collator, Mixstral model and no system prompt.
It achieves the following results on the evaluation set:
{'eval_loss': 0.014605735428631306, 'eval_runtime': 1025.937, 'eval_samples_per_second': 0.476, 'eval_steps_per_second': 0.119, 'eval/rewards/chosen': 6.546164512634277, 'eval/rewards/rejected': -30.777591705322266, 'eval/kl': 0.25049710273742676, 'eval/logps/chosen': -129.4441375732422, 'eval/logps/rejected': -508.0271301269531, 'eval/rewards/margins': 37.32375621795654, 'epoch': 1.99}
## Intended uses & limitations
This model has not been trained to avoid specific intructions.
## Training procedure
Finetuning was done on the model mistralai/Mixtral-8x7B-Instruct-v0.1 with the following prompt:
```
---------------------
Question:
<|user|>
Contexto: {context}
Questão: {question}</s>
---------------------
Response:
<|assistant|>
{response}</s>
---------------------
```
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- per_device_train_batch_size: 4
- per_device_eval_batch_size: 4
- gradient_accumulation_steps: 8
- num_gpus: 1
- total_train_batch_size: 32
- optimizer: AdamW
- lr_scheduler_type: cosine
- num_steps: 262
- quantization_type: bitsandbytes
- LoRA: ("\n - bits: 4\n - use_exllama: True\n - device_map: auto\n - use_cache: False\n - lora_r: 16\n - lora_alpha: 32\n - lora_dropout: 0.05\n - bias: none\n - target_modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj']\n - task_type: CAUSAL_LM",)
### Training results
### Framework versions
- transformers==4.39.1
- datasets==2.18.0
- peft==0.10.0
- safetensors==0.4.2
- evaluate==0.4.1
- bitsandbytes==0.43
- huggingface_hub==0.20.3
- seqeval==1.2.2
- optimum==1.17.1
- auto-gptq==0.7.1
- gpustat==1.1.1
- deepspeed==0.14.0
- wandb==0.16.3
- # trl==0.8.1
- git+https://github.com/claralp/trl.git@fix_nans#egg=trl
- accelerate==0.28.0
- coloredlogs==15.0.1
- traitlets==5.14.1
- autoawq@https://github.com/casper-hansen/AutoAWQ/releases/download/v0.2.0/autoawq-0.2.0+cu118-cp310-cp310-linux_x86_64.whl
### Hardware
- Cloud provided: runpod.io
|