Upload README.md

2a60883 verified 4 months ago

No virus

5.21 kB

	---
	tags:
	- axolot
	- code
	- coding
	- Aguila
	- axolot
	model-index:
	- name: edumunozsala/aguila-7b-instructft-bactrian-x
	results: []
	license: apache-2.0
	language:
	- code
	datasets:
	- MBZUAI/Bactrian-X
	pipeline_tag: text-generation
	---


	# Aguila 7B SFT model on spanish language 👩‍💻

	Aguila 7B supervised instruction finetuned on the [Bactrian-X dataset](https://github.com/mbzuai-nlp/Bactrian-X) by using the Axolot library in 4-bit with [PEFT](https://github.com/huggingface/peft) library.

	## Pretrained description

	[Aguila-7B](projecte-aina/aguila-7b)

	Ǎguila-7B is a transformer-based causal language model for Catalan, Spanish, and English. It is based on the Falcon-7B model and has been trained on a 26B token trilingual corpus collected from publicly available corpora and crawlers.

	More information available in the following post from Medium.com: [Introducing Ǎguila, a new open-source LLM for Spanish and Catalan](https://medium.com/@mpamies247/introducing-a%CC%8Cguila-a-new-open-source-llm-for-spanish-and-catalan-ee1ebc70bc79)


	## Training data

	[MBZUAI/Bactrian-X](https://huggingface.co/datasets/MBZUAI/Bactrian-X)

	The Bactrain-X dataset is a collection of 3.4M instruction-response pairs in 52 languages, that are obtained by translating 67K English instructions (alpaca-52k + dolly-15k) into 51 languages using Google Translate API. The translated instructions are then fed to ChatGPT (gpt-3.5-turbo) to obtain its natural responses, resulting in 3.4M instruction-response pairs in 52 languages (52 languages x 67k instances = 3.4M instances).

	Here we only use the spanish split of the dataset.

	### Training hyperparameters

	The following `axolot` configuration was used during training:
	```
	base_model: projecte-aina/aguila-7b
	# required by falcon custom model code: https://huggingface.co/tiiuae/falcon-7b/tree/main
	trust_remote_code: true
	model_type: AutoModelForCausalLM
	tokenizer_type: AutoTokenizer
	is_falcon_derived_model: true
	load_in_8bit: false
	# enable 4bit for QLoRA
	load_in_4bit: true
	gptq: false
	strict: false

	push_dataset_to_hub:
	datasets:
	- path: edumunozsala/Bactrian-X-es-50k
	type: alpaca
	dataset_prepared_path:
	val_set_size: 0.05
	# enable QLoRA
	adapter: qlora
	lora_model_dir:
	sequence_len: 2048
	max_packed_sequence_len:

	# hyperparameters from QLoRA paper Appendix B.2
	# "We find hyperparameters to be largely robust across datasets"
	lora_r: 64
	lora_alpha: 16
	# 0.1 for models up to 13B
	# 0.05 for 33B and 65B models
	lora_dropout: 0.05
	# add LoRA modules on all linear layers of the base model
	lora_target_modules:
	lora_target_linear: true
	lora_fan_in_fan_out:

	wandb_project:
	wandb_entity:
	wandb_watch:
	wandb_name:
	wandb_log_model:

	output_dir: ./qlora-out

	# QLoRA paper Table 9
	# - 16 for 7b & 13b
	# - 32 for 33b, 64 for 64b
	# Max size tested on A6000
	# - 7b: 40
	# - 40b: 4
	# decrease if OOM, increase for max VRAM utilization
	micro_batch_size: 4
	gradient_accumulation_steps: 2
	num_epochs: 2
	# Optimizer for QLoRA
	optimizer: paged_adamw_32bit
	torchdistx_path:
	lr_scheduler: cosine
	# QLoRA paper Table 9
	# - 2e-4 for 7b & 13b
	# - 1e-4 for 33b & 64b
	learning_rate: 0.0002
	train_on_inputs: false
	group_by_length: false
	bf16: auto
	fp16:
	tf32: true
	gradient_checkpointing: true
	# stop training after this many evaluation losses have increased in a row
	# https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback
	# early_stopping_patience: 3
	resume_from_checkpoint:
	auto_resume_from_checkpoints: true
	local_rank:
	logging_steps: 200
	xformers_attention: true
	flash_attention:
	gptq_groupsize:
	gptq_model_v1:
	warmup_steps: 10
	evals_per_epoch: 1
	saves_per_epoch: 1
	debug:
	deepspeed:
	weight_decay: 0.000001
	fsdp:
	fsdp_config:
	special_tokens:
	pad_token: "<\|endoftext\|>"
	bos_token: "<\|endoftext\|>"
	eos_token: "<\|endoftext\|>"
	```

	### Framework versions
	- torch=="2.1.2"
	- flash-attn=="2.5.0"
	- deepspeed=="0.13.1"
	- axolotl=="0.4.0"

	### Example of usage

	```py
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "edumunozsala/aguila-7b-instructft-bactrian-x"

	tokenizer = AutoTokenizer.from_pretrained(model_id)

	model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, torch_dtype=torch.float16,
	device_map="auto", trust_remote_code=True)

	instruction="Piense en una solución para reducir la congestión del tráfico."

	input=""

	prompt = f"""### Instrucción:
	{instruction}

	### Entrada:
	{input}

	### Respuesta:
	"""

	input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
	# with torch.inference_mode():
	outputs = model.generate(input_ids=input_ids, max_new_tokens=256, do_sample=True, top_p=0.9,temperature=0.3)

	print(f"Prompt:\n{prompt}\n")
	print(f"Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")

	```

	### Citation

	```
	@misc {edumunozsala_2023,
	author = { {Eduardo Muñoz} },
	title = { aguila-7b-instructft-bactrian-x },
	year = 2024,
	url = { https://huggingface.co/edumunozsala/aguila-7b-instructft-bactrian-x },
	publisher = { Hugging Face }
	}
	```