diegobit
/

llama-3-8b-ita-4k-orpo-v3

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

llama-3-8b-ita-4k-orpo-v3 / README.md

diegobit's picture

Update README.md

13e571a verified 6 months ago

|

history blame contribute delete

2.75 kB

	---
	library_name: transformers
	tags:
	- unsloth
	license: llama3
	datasets:
	- mii-community/ultrafeedback-preferences-translated-ita
	- efederici/alpaca-vs-alpaca-orpo-dpo
	---

	# Model Card for Model ID

	This is llama-3-8b ORPO finetuning for the italian language over a concatenation of two datasets:
	- [mii-community/ultrafeedback-preferences-translated-ita](https://huggingface.co/datasets/mii-community/ultrafeedback-preferences-translated-ita)
	- [efederici/alpaca-vs-alpaca-orpo-dpo](https://huggingface.co/datasets/efederici/alpaca-vs-alpaca-orpo-dpo)

	The other two differences with `diegobit/llama-3-8b-Instruct-bnb-4bit-ita-orpo` are:
	- the starting model, not instruct, `astronomer/Llama-3-8B-Special-Tokens-Adjusted` instead of `unsloth/llama-3-8b-Instruct-bnb-4bit`
	- no loading in 4bits
	- given the increased need of GPU memory, the sequence max length used for finetuning is 4096

	## Model Details

	### Model Description

	- Developed by: Diego Giorgini
	- Funded by: AI Technologies SRL - www.aitechnologies.it
	- Language(s) (NLP): Italian
	- License: llama3
	- Finetuned from model: astronomer/Llama-3-8B-Special-Tokens-Adjusted

	## Training Details

	### Environment

	unsloth: 2024.5
	torch: 2.2

	### Training Data

	- `mii-community/ultrafeedback-preferences-translated-ita` is a selection of 55k rows of the ultrafeedback dataset, translated into italian with argotranslate.
	- `efederici/alpaca-vs-alpaca-orpo-dpo`: The Alpaca vs. Alpaca dataset is a curated blend of the Alpaca dataset and the Alpaca GPT-4 dataset, both available on HuggingFace Datasets. It uses the standard GPT dataset as the 'rejected' answer, steering the model towards the GPT-4 answer, which is considered as the 'chosen' one.

	### Training Procedure

	#### Preprocessing [optional]

	- No preprocessing has been performed, except for formatting with the llama3 chat_template from unsloth:

	```tokenizer = get_chat_template(tokenizer, chat_template = "llama-3")```

	#### Training Hyperparameters

	- Training regime: bf16

	- Model loading parameters:

	```
	max_seq_length = 4096
	dtype = None
	load_in_4bit = False
	```

	- PEFT parameters:

	```
	r = 64
	lora_alpha = 64
	lora_dropout = 0
	bias = "none"
	random_state = 3407
	use_rslora = False
	loftq_config = None
	```

	- ORPOConfig parameters:

	```
	max_length = 4096
	max_prompt_length = max_seq_length//2
	max_completion_length = max_seq_length//2
	warmup_ratio = 0.1
	weight_decay = 0.01
	per_device_train_batch_size = 1
	gradient_accumulation_steps = 16
	learning_rate=8e-6
	beta = 0.1
	optim = "paged_adamw_8bit"
	lr_scheduler_type = "linear"
	num_train_epochs = 1
	```

	#### Speeds, Sizes, Times

	19h on an A100-40GB

	## Model Card Contact

	diego.giorgini@icloud.com