File size: 6,458 Bytes
8d27866 0d2f3ab 8d27866 0d2f3ab 8d27866 0d2f3ab 8d27866 0d2f3ab 8d27866 0d2f3ab 8d27866 2de4cf0 8d27866 0d2f3ab 8d27866 0d2f3ab 8d27866 8976355 8d27866 0d2f3ab 2de4cf0 0d2f3ab 8d27866 0d2f3ab 2de4cf0 0d2f3ab 8d27866 2608579 8d27866 0d2f3ab 8d27866 2608579 e722b7d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
---
library_name: transformers
tags:
- function calling
- laser
license: apache-2.0
datasets:
- jtatman/glaive_function_calling_v2_filtered_10k
---
# Model Card
This is a laser fine tuning of Aloobun's [great 1.5b param reyna mini model](https://huggingface.co/aloobun/Reyna-Mini-1.8B-v0.2).
### Model Description
This model is quite conversational - even a bit more so after laser tuning even using Peft. The function calling is mediocre, but will be improved in future versions.
## Uses
As Aloobun's model is well performing and impressive on it's own, I decided to add some function calling while practicing the LaserRMT technique.
### Direct Use
- Chat
- Conversational
- Text Generation
- Function Calling
## Bias, Risks, and Limitations
This model will take over your house, borrow your car, talk badly to your family, and generally make everything incrementally worse. If you use it for nefarious purposes.
### Recommendations
Use at your own risk. It's a great small model, owing to the base model before tuning.
## Training Details
### Training Data
- "eval/loss": 2.1797242164611816,
- "_timestamp": 1708624900.2239263,
- "_runtime": 20945.370138406754,
- "train/train_loss": 2.515587423102269,
- "train/global_step": 918,
- "train/train_steps_per_second": 0.044,
- "train/loss": 2.2062,
- "train/learning_rate": 0,
- "train/train_samples_per_second": 1.403,
- "train/train_runtime": 20945.6359,
- "eval/steps_per_second": 4.867,
- "eval/samples_per_second": 4.867,
- "_step": 923,
- "train/epoch": 2.98,
- "eval/runtime": 41.0972,
- "train/grad_norm": 0.2638521194458008,
- "train/total_flos": 141790931224363000
### Training Procedure
[LaserRMT](https://github.com/cognitivecomputations/laserRMT) was used to refine the weights, using the 16 highest scored weights specifically by noise-to-ratio analysis.
This technique avoids training unnecessarily low-performng weights that can turn to garbage. By pruning these weights, the model size is decreased slightly.
![axolotl](https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/image/axolotl-badge-web.png?raw=true)
Axolotl was used for training and dataset tokenization.
#### Preprocessing
Dataset was formatted in ShareGpt format for the purposes of using with Axolotl, in conversational format.
#### Training Hyperparameters
- lora_r: 64
- lora_alpha: 16
- lora_dropout: 0.05
- gradient_accumulation_steps: 4
- micro_batch_size: 1
- num_epochs: 3
- optimizer: adamw_bnb_8bit
- lr_scheduler: cosine
- learning_rate: 0.00025
#### Evaluation
| Groups |Version| Filter |n-shot| Metric | Value | |Stderr|
|--------------------|-------|----------------|-----:|-----------|------:|---|-----:|
|Open LLM Leaderboard|N/A |none | 5|rouge2_acc | 0.1920|± |0.0176|
| | |none | 5|bleu_max |15.2292|± |0.6714|
| | |flexible-extract| 5|exact_match| 0.0220|± |0.0066|
| - truthfulqa_mc1 | 2|none | 0|acc | 0.2440|± |0.0192|
| - truthfulqa_mc2 | 2|none | 0|acc | 0.4430|± |0.0195|
| - winogrande | 1|none | 5|acc | 0.5120|± |0.0224|
| - arc_challenge | 1|none | 25|acc | 0.1760|± |0.0170|
| | |none | 25|acc_norm | 0.2320|± |0.0189|
| - gsm8k | 3|strict-match | 5|exact_match| 0.0060|± |0.0035|
| | |flexible-extract| 5|exact_match| 0.0220|± |0.0066|
| - hellaswag | 1|none | 10|acc | 0.3520|± |0.0214|
| | |none | 10|acc_norm | 0.4040|± |0.0220|
| | |none | 5|rouge2_diff|-3.3178|± |0.9477|
| | |none | 5|rougeL_acc | 0.3860|± |0.0218|
| | |none | 5|acc_norm | 0.3180|± |0.0145|
| | |none | 5|rouge1_diff|-1.5564|± |1.0223|
| | |none | 5|bleu_diff |-0.6500|± |0.6421|
| | |none | 5|rouge2_max |16.4873|± |1.0172|
| | |none | 5|rougeL_diff|-0.7765|± |1.0034|
| | |strict-match | 5|exact_match| 0.0060|± |0.0035|
| | |none | 5|bleu_acc | 0.4360|± |0.0222|
| | |none | 5|rougeL_max |33.8798|± |0.9367|
| | |none | 5|rouge1_max |36.3550|± |0.9462|
| | |none | 5|rouge1_acc | 0.3700|± |0.0216|
| | |none | 5|acc | 0.2664|± |0.0036|
| - mmlu |N/A |none | 0|acc | 0.2533|± |0.0039|
| - humanities |N/A |none | 5|acc | 0.2408|± |0.0075|
| - other |N/A |none | 5|acc | 0.2443|± |0.0080|
| - social_sciences |N/A |none | 5|acc | 0.2538|± |0.0081|
| - stem |N/A |none | 5|acc | 0.2740|± |0.0079|
| - truthfulqa |N/A |none | 0|rouge2_acc | 0.1920|± |0.0176|
| | |none | 0|rougeL_diff|-0.7765|± |1.0034|
| | |none | 0|bleu_max |15.2292|± |0.6714|
| | |none | 0|rouge2_diff|-3.3178|± |0.9477|
| | |none | 0|rougeL_acc | 0.3860|± |0.0218|
| | |none | 0|bleu_diff |-0.6500|± |0.6421|
| | |none | 0|rouge2_max |16.4873|± |1.0172|
| | |none | 0|rouge1_diff|-1.5564|± |1.0223|
| | |none | 0|acc | 0.3435|± |0.0137|
| | |none | 0|bleu_acc | 0.4360|± |0.0222|
| | |none | 0|rougeL_max |33.8798|± |0.9367|
| | |none | 0|rouge1_max |36.3550|± |0.9462|
| | |none | 0|rouge1_acc | 0.3700|± |0.0216| |