Bitext commited on
Commit
96dcfd2
1 Parent(s): 18ebf54

Create README.md

Browse files

Update README.md detailing the fine-tuned model's training data, architecture, and intended use.

Files changed (1) hide show
  1. README.md +90 -0
README.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - axolotl
5
+ - generated_from_trainer
6
+ - text-generation-inference
7
+ base_model: mistralai/Mistral-7B-Instruct-v0.2
8
+ model_type: mistral
9
+ pipeline_tag: text-generation
10
+ model-index:
11
+ - name: Mistral-7B-Mortgage-Loans-v2
12
+ results: []
13
+ ---
14
+
15
+ # Mistral-7B-Mortgage-Loans-v2
16
+
17
+ ## Model Description
18
+
19
+ This model, "Mistral-7B-Mortgage-Loans-v2," is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) developed to specifically address queries related to mortgage and loans. It provides answers that are crucial for understanding complex loan processes and mortgage applications.
20
+
21
+ ## Intended Use
22
+
23
+ - **Recommended applications**: This model is particularly useful for financial institutions, mortgage brokers, and loan providers. It is designed to integrate into customer support systems to help users understand their loan options, mortgage details, and payment plans.
24
+ - **Out-of-scope**: This model is not designed for non-financial inquiries and should not be used to provide legal, medical, or other advice outside of its financial expertise area.
25
+
26
+ ## Usage Example
27
+
28
+ ```python
29
+ from transformers import AutoModelForCausalLM, AutoTokenizer
30
+
31
+ model = AutoModelForCausalLM.from_pretrained("bitext-llm/Mistral-7B-Mortgage-Loans-v2")
32
+ tokenizer = AutoTokenizer.from_pretrained("bitext-llm/Mistral-7B-Mortgage-Loans-v2")
33
+
34
+ inputs = tokenizer("<s>[INST] What are the requirements for a home loan? [/INST]", return_tensors="pt")
35
+ outputs = model.generate(inputs['input_ids'], max_length=50)
36
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
37
+ ```
38
+
39
+ ## Model Architecture
40
+
41
+ The model utilizes the `MistralForCausalLM` architecture along with a `LlamaTokenizer`. It retains the fundamental characteristics of the base model while being optimized to understand and generate responses in the context of mortgage and loans.
42
+
43
+ ## Training Data
44
+
45
+ The model was trained on a dataset specifically designed for the mortgage and loan sector, featuring 39 intents including `apply_for_loan`, `check_loan_terms`, `refinance_loan`, `customer_service`, and many others, each with nearly 1000 examples. This rich dataset ensures the model's proficiency in addressing a broad spectrum of inquiries within this domain.
46
+
47
+ ## Training Procedure
48
+
49
+ ### Hyperparameters
50
+
51
+ - **Optimizer**: AdamW
52
+ - **Learning Rate**: 0.0002
53
+ - **Epochs**: 1
54
+ - **Batch Size**: 8
55
+ - **Gradient Accumulation Steps**: 4
56
+ - **Maximum Sequence Length**: 1024 tokens
57
+
58
+ ### Environment
59
+
60
+ - **Transformers Version**: 4.40.0.dev0
61
+ - **Framework**: PyTorch 2.2.1+cu121
62
+ - **Tokenizers**: Tokenizers 0.15.0
63
+
64
+ ## Limitations and Bias
65
+
66
+ - The model is fine-tuned on a domain-specific dataset and may not perform well outside the scope of financial advice.
67
+ - Users should be aware of potential biases in the training data, as the model's responses may inadvertently reflect these biases. This model has been trained with a dataset that answers general wealth management questions, so potential biases may exist for specific use cases.
68
+
69
+ ## Ethical Considerations
70
+
71
+ This model should be used responsibly, considering ethical implications of automated financial advice. As it is a base model for this financial field, it is crucial to ensure that the model's advice complements human expertise and adheres to relevant financial regulations.
72
+
73
+ ## Acknowledgments
74
+
75
+ This model was developed by the Bitext and trained on infrastructure provided by Bitext.
76
+
77
+ ## License
78
+
79
+ This model, "Mistral-7B-Mortgage-Loans-v2", is licensed under the Apache License 2.0 by Bitext Innovations International, Inc. This open-source license allows for free use, modification, and distribution of the model but requires that proper credit be given to Bitext.
80
+
81
+ ### Key Points of the Apache 2.0 License
82
+
83
+ - **Permissibility**: Users are allowed to use, modify, and distribute this software freely.
84
+ - **Attribution**: You must provide proper credit to Bitext Innovations International, Inc. when using this model, in accordance with the original copyright notices and the license.
85
+ - **Patent Grant**: The license includes a grant of patent rights from the contributors of the model.
86
+ - **No Warranty**: The model is provided "as is" without warranties of any kind.
87
+
88
+ You may view the full license text at [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0).
89
+
90
+ This licensing ensures the model can be used widely and freely while respecting the intellectual contributions of Bitext. For more detailed information or specific legal questions about using this license, please refer to the official license documentation linked above.