Update README.md
Browse files
README.md
CHANGED
|
@@ -5,13 +5,13 @@ tags: []
|
|
| 5 |
|
| 6 |

|
| 7 |
|
| 8 |
-
|
| 9 |
|
| 10 |
|
| 11 |
# FinPlan-1
|
| 12 |
|
| 13 |
FinPlan-1 is a an LLM trained to assist with the creation of basic personal financial plans for individuals. This model is built off of the
|
| 14 |
-
Fino1 model which is itself a version of Llama-3.1-8B-Instruct, which was CoT fine
|
| 15 |
|
| 16 |
|
| 17 |
|
|
@@ -28,18 +28,18 @@ The financial planning component is one area I think LLMs can be of assistance.
|
|
| 28 |
on financial reasoning tasks to assist individuals with two key aspects of financial planning.
|
| 29 |
|
| 30 |
1. Assist with the creation of a budget spreadsheet to enable individuals to keep track of their finances and understand where their money is going.
|
| 31 |
-
2.
|
| 32 |
|
| 33 |
-
While current LLM's can perform these tasks to an extent, they are often inconsistent with their
|
| 34 |
-
and frequently go beyond the basic tasks at hand
|
| 35 |
-
trained for the corporate financial reasoning tasks but its
|
| 36 |
-
long term savings vehicles which carry tax penalties if not used
|
| 37 |
-
a separate dataset based around short, medium and long term goals to
|
| 38 |
|
| 39 |
The results of this training and prompting method are encouraging as the model consistently produces budget spreadsheets (through the generation of executable python code)
|
| 40 |
as well as somewhat reliable savings plan assistance with the use of few shot prompting. These training methods do have an impact on this model's performance on standard
|
| 41 |
-
benchmarks like
|
| 42 |
-
improved ability to accomplish the tasks of assisting
|
| 43 |
|
| 44 |
|
| 45 |
- **Developed by:** Timothy Austin Rodriguez
|
|
@@ -52,25 +52,25 @@ improved ability to accomplish the tasks of assisting indivudals with budgeting
|
|
| 52 |
### Training Data
|
| 53 |
|
| 54 |
This model is trained on a procedurally generated synthetic dataset that provides structured prompts and responses to assist the underlying Fino-1 8B model
|
| 55 |
-
with creating executable python code which creates and exports budget spreadsheet to a Microsoft Excel .xlsx format.
|
| 56 |
of 3000 examples which were divided into a train/validation split of 2500 for training and 500 for validation. The code used to create and randomize this dataset including
|
| 57 |
the seeds (42 for randomization, 60 for creation) can be located in the ipynb files attached to this repository. This dataset is called budget_dataset.csv
|
| 58 |
|
| 59 |
-
While not used for
|
| 60 |
via procedural generation. This dataset was generated much like the first through random procedural generation of 3000 examples of prompts and responses. random seeds, and
|
| 61 |
train/validation split code can be located in the same ipynb file as the budget dataset. This dataset is called goals_dataset.csv. This dataset was not used to train the final
|
| 62 |
-
model due to poor performance encountered when leveraging LoRA for
|
| 63 |
-
after training than before training. A deeper exploration of why this
|
| 64 |
of this model.
|
| 65 |
|
| 66 |
## Training Method
|
| 67 |
|
| 68 |
The method of training/tuning for this model is the Parameter-Efficient Fine-Tuning method called Low-Rank Adaptation or LoRA. LoRA is a fine tuning approach that is well
|
| 69 |
suited to tuning a model for domain specific tasks such as creating personal financial plans. LoRA is significantly more efficient than full fine tuning requiring fewer compute
|
| 70 |
-
resources and is much more memory efficient as
|
| 71 |
-
heavy computational expense inherent with full fine tuning. This method was chosen given the time allocated for training this model, limited compute
|
| 72 |
-
requests for GPU time on the University of Virginia's Rivanna High Performance Computing cluster and the desire to have similar results to full fine tuning
|
| 73 |
-
compute
|
| 74 |
the summary below.
|
| 75 |
|
| 76 |
Hyperparameters
|
|
@@ -83,8 +83,8 @@ Tuning/Training Settings
|
|
| 83 |
- epochs = 5
|
| 84 |
|
| 85 |
Secondarily, this model makes use of Few Shot Prompting due to the aforementioned poor performance of LoRA when training on the goals dataset. It was found that few shot
|
| 86 |
-
prompting improves the ability of the model to provide the desired response structure without degrading the model's performance as was noted with LoRA implementation
|
| 87 |
-
of the Hyperparameters that were selected. Examples code for how to implement the appropriate few shot prompting is
|
| 88 |
|
| 89 |
|
| 90 |
## Evaluation
|
|
@@ -98,21 +98,21 @@ of the Hyperparameters that were selected. Examples code for how to implement th
|
|
| 98 |
|
| 99 |
|
| 100 |
The benchmarks chosen, GSM8K, MMLU and the two synthetic dataset examples were selected to provide a view of the performance of the model both in terms of its generalization
|
| 101 |
-
ability as well as it's ability to perform the tasks it is trained to accomplish. As the underlying model that FinPlan-1 is based on, Fino-1 8B is a natural
|
| 102 |
to evaluate for benchmarking. Further, the Llama 3.2-3B Instruct model is a newer version of the model which underlies Fino-1 8B albeit a smaller version parameter wise. Given
|
| 103 |
-
this model's rather decent performance on the financial planning tasks it serves as a good
|
| 104 |
-
size parameter wise to FinPlan-1 and was originally considered as a potential base model to train for FinPlan-1, thus making it a good model for comparison.Since the tasks this model is tuned to accomplish are non standard and domain specific, the
|
| 105 |
benchmark for these tasks comes from the validation/hold out split of the training dataset and its evaluation is somewhat subjective. For each of these models, the Budget and Goals examples were
|
| 106 |
presented to the model in either a zero shot prompt (budget) or a three shot prompt (goals). Only the trained FinPlan-1 model was able to provide the desired format for the excel file
|
| 107 |
for the budget task while both Fino-1 8B and FinPlan-1 performed well on the goals dataset. For measurement of generalizability and retention of reasoning skill, all four models
|
| 108 |
-
were benchmarked on GSM8K (grade school mathematics reasoning) as well as MMLU (general reasoning). While the domain specific LoRA tuning certainly led to a
|
| 109 |
benchmark scores with respect to its underlying model Fino-1 8B, the drop in performance is rather small for MMLU and GSM8K performance remains above Llama 3.2 -3B Instruct.
|
| 110 |
|
| 111 |
-
## Intended
|
| 112 |
|
| 113 |
As described above this model is intended to be used to assist with the creation of simple financial plans for individuals, specifically for assistance with the creation of a budget
|
| 114 |
-
spreadsheet for tracking
|
| 115 |
-
not recommended to use this model for those purposes as it has been
|
| 116 |
|
| 117 |
See below for the basic code required in order to import the model from huggingface using torch. Note the tokenizer is pulled from the Fino-1 8B repository as it was not changed
|
| 118 |
from the base Fino-1 8B model.
|
|
@@ -180,13 +180,13 @@ print(generated_text)
|
|
| 180 |
|
| 181 |
The prompt format varies between the budget task and the goals task.
|
| 182 |
|
| 183 |
-
For the budget task, the following prompt method is
|
| 184 |
|
| 185 |
```{python}\n
|
| 186 |
Q: I have an income of about 53255 a year and my monthly expenses include 2208 a month in rent and utilities, a 700 car payment, $300 in food, and about 205 a month in other expenses. Using python, can you create for me a budget spreadsheet and export it to excel?
|
| 187 |
```
|
| 188 |
|
| 189 |
-
For the goals task, I
|
| 190 |
to the few shot examples derived from the goals dataset.
|
| 191 |
|
| 192 |
```{python}\n
|
|
@@ -194,7 +194,7 @@ Q: My short term goal is to save for a $3357 vacation in the next year, my mediu
|
|
| 194 |
```
|
| 195 |
|
| 196 |
|
| 197 |
-
I
|
| 198 |
|
| 199 |
```{python}\n
|
| 200 |
|
|
@@ -312,13 +312,13 @@ Finally, to make saving easier, I'll set up automatic transfers from my checking
|
|
| 312 |
|
| 313 |
There are several risks and limitations of this model that are worth mentioning. First, in a handful of cases this model produced responses in which the math inherent in the
|
| 314 |
savings goals responses was not correct, sometimes failing to add numbers up correctly or having slight rounding errors when dividing long term goals into monthly targets.
|
| 315 |
-
While it is well known that LLMs can struggle with mathematics given that their
|
| 316 |
-
I strongly
|
| 317 |
that safeguard is not implemented for the goals task. Further, this model should be limited in its use for out of scope tasks as the generalization benchmarks demonstrated that
|
| 318 |
compared to its base model, this model exhibits decreased reasoning ability outside its domain specific task.
|
| 319 |
|
| 320 |
-
In order to
|
| 321 |
-
prompts. Sticking with python for mathematics processing
|
| 322 |
|
| 323 |
[More Information Needed]
|
| 324 |
|
|
@@ -355,4 +355,4 @@ Timothy Austin Rodriguez
|
|
| 355 |
|
| 356 |
## Model Card Contact
|
| 357 |
|
| 358 |
-
tar3kh@virginia.edu
|
|
|
|
| 5 |
|
| 6 |

|
| 7 |
|
| 8 |
+
Created with Chat GPT 4o using a link to this model repository, and a brief prompt.
|
| 9 |
|
| 10 |
|
| 11 |
# FinPlan-1
|
| 12 |
|
| 13 |
FinPlan-1 is a an LLM trained to assist with the creation of basic personal financial plans for individuals. This model is built off of the
|
| 14 |
+
Fino1 model which is itself a version of Llama-3.1-8B-Instruct, which was CoT fine-tuned to improve its financial reasoning ability.
|
| 15 |
|
| 16 |
|
| 17 |
|
|
|
|
| 28 |
on financial reasoning tasks to assist individuals with two key aspects of financial planning.
|
| 29 |
|
| 30 |
1. Assist with the creation of a budget spreadsheet to enable individuals to keep track of their finances and understand where their money is going.
|
| 31 |
+
2. Aid with planning for short, medium and long term goals including breaking those goals down into monthly savings targets, and suggesting broad investment vehicles to fit each goal's timeframe.
|
| 32 |
|
| 33 |
+
While current LLM's can perform these tasks to an extent, they are often inconsistent with their response structure, can sometimes struggle with breaking down basic mathematics
|
| 34 |
+
and frequently go beyond the basic tasks at hand recommending inappropriate savings and investment vehicles for individual savings goals. The Fino-1 8B model is certainly well
|
| 35 |
+
trained for the corporate financial reasoning tasks but its recommendations for savings and investment vehicles were often too aggressive for short term goals and may recommend
|
| 36 |
+
long term savings vehicles which carry tax penalties if not used appropriately. This model uses LoRA on a procedurally generated budgeting dataset as well as few shot prompting using
|
| 37 |
+
a separate dataset based around short, medium and long term goals to enhance the ability of Fino-1 8B to accomplish these tasks.
|
| 38 |
|
| 39 |
The results of this training and prompting method are encouraging as the model consistently produces budget spreadsheets (through the generation of executable python code)
|
| 40 |
as well as somewhat reliable savings plan assistance with the use of few shot prompting. These training methods do have an impact on this model's performance on standard
|
| 41 |
+
benchmarks like GSM8K and MMLU resulting in drops in performance on both tasks compared with the base model, however this loss in generalization is made up for in the model's
|
| 42 |
+
improved ability to accomplish the tasks of assisting individuals with budgeting and fixed term savings goals.
|
| 43 |
|
| 44 |
|
| 45 |
- **Developed by:** Timothy Austin Rodriguez
|
|
|
|
| 52 |
### Training Data
|
| 53 |
|
| 54 |
This model is trained on a procedurally generated synthetic dataset that provides structured prompts and responses to assist the underlying Fino-1 8B model
|
| 55 |
+
with creating executable python code which creates and exports budget spreadsheet to a Microsoft Excel .xlsx format. This dataset (attached to this repository) is comprised
|
| 56 |
of 3000 examples which were divided into a train/validation split of 2500 for training and 500 for validation. The code used to create and randomize this dataset including
|
| 57 |
the seeds (42 for randomization, 60 for creation) can be located in the ipynb files attached to this repository. This dataset is called budget_dataset.csv
|
| 58 |
|
| 59 |
+
While not used for training this model, a secondary dataset for the purposes of improving the model's performance on short, medium and long term goal planning was developed
|
| 60 |
via procedural generation. This dataset was generated much like the first through random procedural generation of 3000 examples of prompts and responses. random seeds, and
|
| 61 |
train/validation split code can be located in the same ipynb file as the budget dataset. This dataset is called goals_dataset.csv. This dataset was not used to train the final
|
| 62 |
+
model due to poor performance encountered when leveraging LoRA for additional training. The model actually performed worse when prompted with an example from the validation dataset
|
| 63 |
+
after training than before training. A deeper exploration of why this occurred is warranted and other training/tuning methods should be considered beyond LoRA for future enhancement
|
| 64 |
of this model.
|
| 65 |
|
| 66 |
## Training Method
|
| 67 |
|
| 68 |
The method of training/tuning for this model is the Parameter-Efficient Fine-Tuning method called Low-Rank Adaptation or LoRA. LoRA is a fine tuning approach that is well
|
| 69 |
suited to tuning a model for domain specific tasks such as creating personal financial plans. LoRA is significantly more efficient than full fine tuning requiring fewer compute
|
| 70 |
+
resources and is much more memory efficient as fewer model weights are changed. In many cases LoRA implementation yields results very similar to full fine tuning without the
|
| 71 |
+
heavy computational expense inherent with full fine tuning. This method was chosen given the time allocated for training this model, limited compute resources due to competing
|
| 72 |
+
requests for GPU time on the University of Virginia's Rivanna High Performance Computing cluster and the desire to have similar results to full fine tuning despite the lack of
|
| 73 |
+
compute resources required. LoRA Tuning hyperparameter values were selected through experimentation and can be found in one of the ipynb files attached to this repository and in
|
| 74 |
the summary below.
|
| 75 |
|
| 76 |
Hyperparameters
|
|
|
|
| 83 |
- epochs = 5
|
| 84 |
|
| 85 |
Secondarily, this model makes use of Few Shot Prompting due to the aforementioned poor performance of LoRA when training on the goals dataset. It was found that few shot
|
| 86 |
+
prompting improves the ability of the model to provide the desired response structure without degrading the model's performance as was noted with LoRA implementation regardless
|
| 87 |
+
of the Hyperparameters that were selected. Examples code for how to implement the appropriate few shot prompting is available in one of the provide ipynb files in this repository.
|
| 88 |
|
| 89 |
|
| 90 |
## Evaluation
|
|
|
|
| 98 |
|
| 99 |
|
| 100 |
The benchmarks chosen, GSM8K, MMLU and the two synthetic dataset examples were selected to provide a view of the performance of the model both in terms of its generalization
|
| 101 |
+
ability as well as it's ability to perform the tasks it is trained to accomplish. As the underlying model that FinPlan-1 is based on, Fino-1 8B is a natural comparison model
|
| 102 |
to evaluate for benchmarking. Further, the Llama 3.2-3B Instruct model is a newer version of the model which underlies Fino-1 8B albeit a smaller version parameter wise. Given
|
| 103 |
+
this model's rather decent performance on the financial planning tasks it serves as a good comparison for FinPlan-1. Finally Ministral 8B instruct -2410 model is of comparable
|
| 104 |
+
size parameter wise to FinPlan-1 and was originally considered as a potential base model to train for FinPlan-1, thus making it a good model for comparison. Since the tasks this model is tuned to accomplish are non standard and domain specific, the
|
| 105 |
benchmark for these tasks comes from the validation/hold out split of the training dataset and its evaluation is somewhat subjective. For each of these models, the Budget and Goals examples were
|
| 106 |
presented to the model in either a zero shot prompt (budget) or a three shot prompt (goals). Only the trained FinPlan-1 model was able to provide the desired format for the excel file
|
| 107 |
for the budget task while both Fino-1 8B and FinPlan-1 performed well on the goals dataset. For measurement of generalizability and retention of reasoning skill, all four models
|
| 108 |
+
were benchmarked on GSM8K (grade school mathematics reasoning) as well as MMLU (general reasoning). While the domain specific LoRA tuning certainly led to a degradation in FinPlan-1's
|
| 109 |
benchmark scores with respect to its underlying model Fino-1 8B, the drop in performance is rather small for MMLU and GSM8K performance remains above Llama 3.2 -3B Instruct.
|
| 110 |
|
| 111 |
+
## Intended Usage
|
| 112 |
|
| 113 |
As described above this model is intended to be used to assist with the creation of simple financial plans for individuals, specifically for assistance with the creation of a budget
|
| 114 |
+
spreadsheet for tracking expenses as well as planning for, short, medium and long term savings goals. While this model can be prompted on a wide range of other tasks, it is
|
| 115 |
+
not recommended to use this model for those purposes as it has been specifically fine-tuned for these two tasks and performance on tasks outside that scope could be diminished.
|
| 116 |
|
| 117 |
See below for the basic code required in order to import the model from huggingface using torch. Note the tokenizer is pulled from the Fino-1 8B repository as it was not changed
|
| 118 |
from the base Fino-1 8B model.
|
|
|
|
| 180 |
|
| 181 |
The prompt format varies between the budget task and the goals task.
|
| 182 |
|
| 183 |
+
For the budget task, the following prompt method is recommended.
|
| 184 |
|
| 185 |
```{python}\n
|
| 186 |
Q: I have an income of about 53255 a year and my monthly expenses include 2208 a month in rent and utilities, a 700 car payment, $300 in food, and about 205 a month in other expenses. Using python, can you create for me a budget spreadsheet and export it to excel?
|
| 187 |
```
|
| 188 |
|
| 189 |
+
For the goals task, I recommend using Few Shot Prompting, making use of the goals_dataset.csv file as your base and then adding your preferred prompt in the following format
|
| 190 |
to the few shot examples derived from the goals dataset.
|
| 191 |
|
| 192 |
```{python}\n
|
|
|
|
| 194 |
```
|
| 195 |
|
| 196 |
|
| 197 |
+
I recommend the following code to set up few shot prompting for the goals task:
|
| 198 |
|
| 199 |
```{python}\n
|
| 200 |
|
|
|
|
| 312 |
|
| 313 |
There are several risks and limitations of this model that are worth mentioning. First, in a handful of cases this model produced responses in which the math inherent in the
|
| 314 |
savings goals responses was not correct, sometimes failing to add numbers up correctly or having slight rounding errors when dividing long term goals into monthly targets.
|
| 315 |
+
While it is well known that LLMs can struggle with mathematics given that their knowledge is language based and not numerically based, this can be a problem for a finance focused LLM.
|
| 316 |
+
I strongly recommend double checking the figures presented by this model. While this issue is sidestepped in the budget task through the use of python code to prevent math errors
|
| 317 |
that safeguard is not implemented for the goals task. Further, this model should be limited in its use for out of scope tasks as the generalization benchmarks demonstrated that
|
| 318 |
compared to its base model, this model exhibits decreased reasoning ability outside its domain specific task.
|
| 319 |
|
| 320 |
+
In order to improve this model I would recommend future model trainers and tuners focus on adjusting this model to default to producing python code for all mathematics based
|
| 321 |
+
prompts. Sticking with python for mathematics processing should allow the model to perform more highly on the goals task while retaining performance on the
|
| 322 |
|
| 323 |
[More Information Needed]
|
| 324 |
|
|
|
|
| 355 |
|
| 356 |
## Model Card Contact
|
| 357 |
|
| 358 |
+
tar3kh@virginia.edu
|