Update README.md

e722b7d verified 8 months ago

6.46 kB

	---
	library_name: transformers
	tags:
	- function calling
	- laser
	license: apache-2.0
	datasets:
	- jtatman/glaive_function_calling_v2_filtered_10k
	---

	# Model Card

	This is a laser fine tuning of Aloobun's [great 1.5b param reyna mini model](https://huggingface.co/aloobun/Reyna-Mini-1.8B-v0.2).

	### Model Description

	This model is quite conversational - even a bit more so after laser tuning even using Peft. The function calling is mediocre, but will be improved in future versions.

	## Uses

	As Aloobun's model is well performing and impressive on it's own, I decided to add some function calling while practicing the LaserRMT technique.

	### Direct Use

	- Chat
	- Conversational
	- Text Generation
	- Function Calling

	## Bias, Risks, and Limitations

	This model will take over your house, borrow your car, talk badly to your family, and generally make everything incrementally worse. If you use it for nefarious purposes.

	### Recommendations

	Use at your own risk. It's a great small model, owing to the base model before tuning.

	## Training Details

	### Training Data


	- "eval/loss": 2.1797242164611816,
	- "_timestamp": 1708624900.2239263,
	- "_runtime": 20945.370138406754,
	- "train/train_loss": 2.515587423102269,
	- "train/global_step": 918,
	- "train/train_steps_per_second": 0.044,
	- "train/loss": 2.2062,
	- "train/learning_rate": 0,
	- "train/train_samples_per_second": 1.403,
	- "train/train_runtime": 20945.6359,
	- "eval/steps_per_second": 4.867,
	- "eval/samples_per_second": 4.867,
	- "_step": 923,
	- "train/epoch": 2.98,
	- "eval/runtime": 41.0972,
	- "train/grad_norm": 0.2638521194458008,
	- "train/total_flos": 141790931224363000


	### Training Procedure

	[LaserRMT](https://github.com/cognitivecomputations/laserRMT) was used to refine the weights, using the 16 highest scored weights specifically by noise-to-ratio analysis.

	This technique avoids training unnecessarily low-performng weights that can turn to garbage. By pruning these weights, the model size is decreased slightly.

	![axolotl](https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/image/axolotl-badge-web.png?raw=true)

	Axolotl was used for training and dataset tokenization.

	#### Preprocessing

	Dataset was formatted in ShareGpt format for the purposes of using with Axolotl, in conversational format.

	#### Training Hyperparameters

	- lora_r: 64
	- lora_alpha: 16
	- lora_dropout: 0.05
	- gradient_accumulation_steps: 4
	- micro_batch_size: 1
	- num_epochs: 3
	- optimizer: adamw_bnb_8bit
	- lr_scheduler: cosine
	- learning_rate: 0.00025

	#### Evaluation

	\| Groups \|Version\| Filter \|n-shot\| Metric \| Value \| \|Stderr\|
	\|--------------------\|-------\|----------------\|-----:\|-----------\|------:\|---\|-----:\|
	\|Open LLM Leaderboard\|N/A \|none \| 5\|rouge2_acc \| 0.1920\|± \|0.0176\|
	\| \| \|none \| 5\|bleu_max \|15.2292\|± \|0.6714\|
	\| \| \|flexible-extract\| 5\|exact_match\| 0.0220\|± \|0.0066\|
	\| - truthfulqa_mc1 \| 2\|none \| 0\|acc \| 0.2440\|± \|0.0192\|
	\| - truthfulqa_mc2 \| 2\|none \| 0\|acc \| 0.4430\|± \|0.0195\|
	\| - winogrande \| 1\|none \| 5\|acc \| 0.5120\|± \|0.0224\|
	\| - arc_challenge \| 1\|none \| 25\|acc \| 0.1760\|± \|0.0170\|
	\| \| \|none \| 25\|acc_norm \| 0.2320\|± \|0.0189\|
	\| - gsm8k \| 3\|strict-match \| 5\|exact_match\| 0.0060\|± \|0.0035\|
	\| \| \|flexible-extract\| 5\|exact_match\| 0.0220\|± \|0.0066\|
	\| - hellaswag \| 1\|none \| 10\|acc \| 0.3520\|± \|0.0214\|
	\| \| \|none \| 10\|acc_norm \| 0.4040\|± \|0.0220\|
	\| \| \|none \| 5\|rouge2_diff\|-3.3178\|± \|0.9477\|
	\| \| \|none \| 5\|rougeL_acc \| 0.3860\|± \|0.0218\|
	\| \| \|none \| 5\|acc_norm \| 0.3180\|± \|0.0145\|
	\| \| \|none \| 5\|rouge1_diff\|-1.5564\|± \|1.0223\|
	\| \| \|none \| 5\|bleu_diff \|-0.6500\|± \|0.6421\|
	\| \| \|none \| 5\|rouge2_max \|16.4873\|± \|1.0172\|
	\| \| \|none \| 5\|rougeL_diff\|-0.7765\|± \|1.0034\|
	\| \| \|strict-match \| 5\|exact_match\| 0.0060\|± \|0.0035\|
	\| \| \|none \| 5\|bleu_acc \| 0.4360\|± \|0.0222\|
	\| \| \|none \| 5\|rougeL_max \|33.8798\|± \|0.9367\|
	\| \| \|none \| 5\|rouge1_max \|36.3550\|± \|0.9462\|
	\| \| \|none \| 5\|rouge1_acc \| 0.3700\|± \|0.0216\|
	\| \| \|none \| 5\|acc \| 0.2664\|± \|0.0036\|
	\| - mmlu \|N/A \|none \| 0\|acc \| 0.2533\|± \|0.0039\|
	\| - humanities \|N/A \|none \| 5\|acc \| 0.2408\|± \|0.0075\|
	\| - other \|N/A \|none \| 5\|acc \| 0.2443\|± \|0.0080\|
	\| - social_sciences \|N/A \|none \| 5\|acc \| 0.2538\|± \|0.0081\|
	\| - stem \|N/A \|none \| 5\|acc \| 0.2740\|± \|0.0079\|
	\| - truthfulqa \|N/A \|none \| 0\|rouge2_acc \| 0.1920\|± \|0.0176\|
	\| \| \|none \| 0\|rougeL_diff\|-0.7765\|± \|1.0034\|
	\| \| \|none \| 0\|bleu_max \|15.2292\|± \|0.6714\|
	\| \| \|none \| 0\|rouge2_diff\|-3.3178\|± \|0.9477\|
	\| \| \|none \| 0\|rougeL_acc \| 0.3860\|± \|0.0218\|
	\| \| \|none \| 0\|bleu_diff \|-0.6500\|± \|0.6421\|
	\| \| \|none \| 0\|rouge2_max \|16.4873\|± \|1.0172\|
	\| \| \|none \| 0\|rouge1_diff\|-1.5564\|± \|1.0223\|
	\| \| \|none \| 0\|acc \| 0.3435\|± \|0.0137\|
	\| \| \|none \| 0\|bleu_acc \| 0.4360\|± \|0.0222\|
	\| \| \|none \| 0\|rougeL_max \|33.8798\|± \|0.9367\|
	\| \| \|none \| 0\|rouge1_max \|36.3550\|± \|0.9462\|
	\| \| \|none \| 0\|rouge1_acc \| 0.3700\|± \|0.0216\|