File size: 2,753 Bytes
0775169 6a1ee80 0775169 6a1ee80 0775169 6a1ee80 0775169 6a1ee80 0775169 6a1ee80 0775169 6a1ee80 0775169 6a1ee80 0775169 6a1ee80 0775169 6a1ee80 0775169 6a1ee80 0775169 6a1ee80 13e571a 6a1ee80 0775169 6a1ee80 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
---
library_name: transformers
tags:
- unsloth
license: llama3
datasets:
- mii-community/ultrafeedback-preferences-translated-ita
- efederici/alpaca-vs-alpaca-orpo-dpo
---
# Model Card for Model ID
This is llama-3-8b ORPO finetuning for the italian language over a concatenation of two datasets:
- [mii-community/ultrafeedback-preferences-translated-ita](https://huggingface.co/datasets/mii-community/ultrafeedback-preferences-translated-ita)
- [efederici/alpaca-vs-alpaca-orpo-dpo](https://huggingface.co/datasets/efederici/alpaca-vs-alpaca-orpo-dpo)
The other two differences with `diegobit/llama-3-8b-Instruct-bnb-4bit-ita-orpo` are:
- the starting model, not instruct, `astronomer/Llama-3-8B-Special-Tokens-Adjusted` instead of `unsloth/llama-3-8b-Instruct-bnb-4bit`
- no loading in 4bits
- given the increased need of GPU memory, the sequence max length used for finetuning is 4096
## Model Details
### Model Description
- **Developed by:** Diego Giorgini
- **Funded by:** AI Technologies SRL - www.aitechnologies.it
- **Language(s) (NLP):** Italian
- **License:** llama3
- **Finetuned from model:** astronomer/Llama-3-8B-Special-Tokens-Adjusted
## Training Details
### Environment
unsloth: 2024.5
torch: 2.2
### Training Data
- `mii-community/ultrafeedback-preferences-translated-ita` is a selection of 55k rows of the ultrafeedback dataset, translated into italian with argotranslate.
- `efederici/alpaca-vs-alpaca-orpo-dpo`: The Alpaca vs. Alpaca dataset is a curated blend of the Alpaca dataset and the Alpaca GPT-4 dataset, both available on HuggingFace Datasets. It uses the standard GPT dataset as the 'rejected' answer, steering the model towards the GPT-4 answer, which is considered as the 'chosen' one.
### Training Procedure
#### Preprocessing [optional]
- No preprocessing has been performed, except for formatting with the llama3 chat_template from unsloth:
```tokenizer = get_chat_template(tokenizer, chat_template = "llama-3")```
#### Training Hyperparameters
- **Training regime:** bf16
- **Model loading parameters:**
```
max_seq_length = 4096
dtype = None
load_in_4bit = False
```
- **PEFT parameters:**
```
r = 64
lora_alpha = 64
lora_dropout = 0
bias = "none"
random_state = 3407
use_rslora = False
loftq_config = None
```
- **ORPOConfig parameters:**
```
max_length = 4096
max_prompt_length = max_seq_length//2
max_completion_length = max_seq_length//2
warmup_ratio = 0.1
weight_decay = 0.01
per_device_train_batch_size = 1
gradient_accumulation_steps = 16
learning_rate=8e-6
beta = 0.1
optim = "paged_adamw_8bit"
lr_scheduler_type = "linear"
num_train_epochs = 1
```
#### Speeds, Sizes, Times
19h on an A100-40GB
## Model Card Contact
diego.giorgini@icloud.com |