Update README.md
Browse files
README.md
CHANGED
@@ -3,197 +3,126 @@ library_name: transformers
|
|
3 |
tags: []
|
4 |
---
|
5 |
|
6 |
-
# Model Card
|
7 |
|
8 |
-
|
|
|
9 |
|
|
|
10 |
|
|
|
11 |
|
12 |
-
## Model
|
13 |
-
|
14 |
-
### Model Description
|
15 |
-
|
16 |
-
<!-- Provide a longer summary of what this model is. -->
|
17 |
-
|
18 |
-
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
|
19 |
-
|
20 |
-
- **Developed by:** [More Information Needed]
|
21 |
-
- **Funded by [optional]:** [More Information Needed]
|
22 |
-
- **Shared by [optional]:** [More Information Needed]
|
23 |
-
- **Model type:** [More Information Needed]
|
24 |
-
- **Language(s) (NLP):** [More Information Needed]
|
25 |
-
- **License:** [More Information Needed]
|
26 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
27 |
-
|
28 |
-
### Model Sources [optional]
|
29 |
-
|
30 |
-
<!-- Provide the basic links for the model. -->
|
31 |
-
|
32 |
-
- **Repository:** [More Information Needed]
|
33 |
-
- **Paper [optional]:** [More Information Needed]
|
34 |
-
- **Demo [optional]:** [More Information Needed]
|
35 |
-
|
36 |
-
## Uses
|
37 |
-
|
38 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
39 |
-
|
40 |
-
### Direct Use
|
41 |
-
|
42 |
-
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
43 |
-
|
44 |
-
[More Information Needed]
|
45 |
-
|
46 |
-
### Downstream Use [optional]
|
47 |
-
|
48 |
-
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
49 |
-
|
50 |
-
[More Information Needed]
|
51 |
-
|
52 |
-
### Out-of-Scope Use
|
53 |
-
|
54 |
-
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
55 |
-
|
56 |
-
[More Information Needed]
|
57 |
-
|
58 |
-
## Bias, Risks, and Limitations
|
59 |
-
|
60 |
-
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
61 |
|
62 |
-
|
63 |
|
64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
|
66 |
-
|
67 |
-
|
68 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
69 |
-
|
70 |
-
## How to Get Started with the Model
|
71 |
-
|
72 |
-
Use the code below to get started with the model.
|
73 |
-
|
74 |
-
[More Information Needed]
|
75 |
-
|
76 |
-
## Training Details
|
77 |
-
|
78 |
-
### Training Data
|
79 |
-
|
80 |
-
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
81 |
-
|
82 |
-
[More Information Needed]
|
83 |
-
|
84 |
-
### Training Procedure
|
85 |
-
|
86 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
87 |
-
|
88 |
-
#### Preprocessing [optional]
|
89 |
-
|
90 |
-
[More Information Needed]
|
91 |
-
|
92 |
-
|
93 |
-
#### Training Hyperparameters
|
94 |
-
|
95 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
96 |
-
|
97 |
-
#### Speeds, Sizes, Times [optional]
|
98 |
-
|
99 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
100 |
|
101 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
|
103 |
## Evaluation
|
|
|
104 |
|
105 |
-
|
106 |
-
|
107 |
-
|
108 |
-
|
109 |
-
#### Testing Data
|
110 |
-
|
111 |
-
<!-- This should link to a Dataset Card if possible. -->
|
112 |
-
|
113 |
-
[More Information Needed]
|
114 |
-
|
115 |
-
#### Factors
|
116 |
-
|
117 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
118 |
-
|
119 |
-
[More Information Needed]
|
120 |
-
|
121 |
-
#### Metrics
|
122 |
-
|
123 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
124 |
-
|
125 |
-
[More Information Needed]
|
126 |
-
|
127 |
-
### Results
|
128 |
-
|
129 |
-
[More Information Needed]
|
130 |
-
|
131 |
-
#### Summary
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
## Model Examination [optional]
|
136 |
-
|
137 |
-
<!-- Relevant interpretability work for the model goes here -->
|
138 |
-
|
139 |
-
[More Information Needed]
|
140 |
-
|
141 |
-
## Environmental Impact
|
142 |
-
|
143 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
144 |
-
|
145 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
146 |
-
|
147 |
-
- **Hardware Type:** [More Information Needed]
|
148 |
-
- **Hours used:** [More Information Needed]
|
149 |
-
- **Cloud Provider:** [More Information Needed]
|
150 |
-
- **Compute Region:** [More Information Needed]
|
151 |
-
- **Carbon Emitted:** [More Information Needed]
|
152 |
-
|
153 |
-
## Technical Specifications [optional]
|
154 |
-
|
155 |
-
### Model Architecture and Objective
|
156 |
-
|
157 |
-
[More Information Needed]
|
158 |
-
|
159 |
-
### Compute Infrastructure
|
160 |
-
|
161 |
-
[More Information Needed]
|
162 |
-
|
163 |
-
#### Hardware
|
164 |
-
|
165 |
-
[More Information Needed]
|
166 |
-
|
167 |
-
#### Software
|
168 |
-
|
169 |
-
[More Information Needed]
|
170 |
-
|
171 |
-
## Citation [optional]
|
172 |
-
|
173 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
174 |
-
|
175 |
-
**BibTeX:**
|
176 |
-
|
177 |
-
[More Information Needed]
|
178 |
-
|
179 |
-
**APA:**
|
180 |
-
|
181 |
-
[More Information Needed]
|
182 |
-
|
183 |
-
## Glossary [optional]
|
184 |
-
|
185 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
186 |
-
|
187 |
-
[More Information Needed]
|
188 |
-
|
189 |
-
## More Information [optional]
|
190 |
-
|
191 |
-
[More Information Needed]
|
192 |
|
193 |
-
##
|
|
|
194 |
|
195 |
-
|
|
|
196 |
|
197 |
-
##
|
|
|
198 |
|
199 |
-
[More Information Needed]
|
|
|
3 |
tags: []
|
4 |
---
|
5 |
|
6 |
+
# Model Card: Falconsai/florence-2-invoice
|
7 |
|
8 |
+
- **Developed by:** Michael Stattelman for Falcons.ai
|
9 |
+
- **Funded by [optional]:** Falcons.ai
|
10 |
|
11 |
+
### Model Sources:
|
12 |
|
13 |
+
- **Repository:** https://github.com/Falcons-ai/florence2_invoice_finetuning
|
14 |
|
15 |
+
## Model Overview
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
+
`Falconsai/florence-2-invoice` is a fine-tuned version of the `microsoft/Florence-2-base-ft` model. This model has been specifically trained to identify and extract key fields from invoice images. The fine-tuning process utilized a curated dataset of invoices annotated to recognize the following fields:
|
18 |
|
19 |
+
- Billing address
|
20 |
+
- Discount percentage
|
21 |
+
- Due date
|
22 |
+
- Email client
|
23 |
+
- Header
|
24 |
+
- Invoice date
|
25 |
+
- Invoice number
|
26 |
+
- Name client
|
27 |
+
- Products
|
28 |
+
- Remise
|
29 |
+
- Shipping address
|
30 |
+
- Subtotal
|
31 |
+
- Tax
|
32 |
+
- Tax percentage
|
33 |
+
- Tel client
|
34 |
+
- Total
|
35 |
|
36 |
+
## Model Details
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
+
### Base Model
|
39 |
+
The base model used for fine-tuning is `microsoft/Florence-2-base-ft`, a state-of-the-art vision model developed by Microsoft.
|
40 |
+
|
41 |
+
### Fine-tuning Configuration
|
42 |
+
The fine-tuning process was carried out using a Low-Rank Adaptation (LoRa) configuration with the following parameters:
|
43 |
+
|
44 |
+
```python
|
45 |
+
LoraConfig(
|
46 |
+
r=8,
|
47 |
+
lora_alpha=8,
|
48 |
+
target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "linear", "Conv2d", "lm_head", "fc2"],
|
49 |
+
task_type="CAUSAL_LM",
|
50 |
+
lora_dropout=0.05,
|
51 |
+
bias="none",
|
52 |
+
inference_mode=False,
|
53 |
+
use_rslora=True,
|
54 |
+
init_lora_weights="gaussian",
|
55 |
+
revision=REVISION
|
56 |
+
)
|
57 |
+
```
|
58 |
+
|
59 |
+
### Hardware Used
|
60 |
+
The fine-tuning process was conducted on an Alienware system, ensuring robust performance and efficient training.
|
61 |
+
|
62 |
+
## Dataset
|
63 |
+
The model was trained on a curated dataset of invoice images. Each invoice was annotated to identify the specific fields listed above. This dataset ensured that the model learned to accurately detect and extract key information from various invoice formats.
|
64 |
+
|
65 |
+
## Usage
|
66 |
+
|
67 |
+
### Inference
|
68 |
+
To use this model for inference, you can load it via the Hugging Face Transformers library:
|
69 |
+
|
70 |
+
```python
|
71 |
+
from PIL import Image
|
72 |
+
from transformers import (
|
73 |
+
AdamW,
|
74 |
+
AutoModelForCausalLM,
|
75 |
+
AutoProcessor,
|
76 |
+
get_scheduler
|
77 |
+
)
|
78 |
+
def run_florence_invoice(img, task_prompt, text_input=None):
|
79 |
+
image = Image.open(img)
|
80 |
+
|
81 |
+
# Ensure the image is in RGB format
|
82 |
+
if image.mode != "RGB":
|
83 |
+
image = image.convert("RGB")
|
84 |
+
|
85 |
+
model_id2 = "Falconsai/florence-2-invoice"
|
86 |
+
model = AutoModelForCausalLM.from_pretrained(model_id2, trust_remote_code=True).eval().cuda()
|
87 |
+
processor = AutoProcessor.from_pretrained(model_id2, trust_remote_code=True)
|
88 |
+
|
89 |
+
with torch.no_grad():
|
90 |
+
if text_input is None:
|
91 |
+
prompt = task_prompt
|
92 |
+
else:
|
93 |
+
prompt = task_prompt + text_input
|
94 |
+
inputs = processor(text=prompt, images=image, return_tensors="pt")
|
95 |
+
generated_ids = model.generate(
|
96 |
+
input_ids=inputs["input_ids"].cuda(),
|
97 |
+
pixel_values=inputs["pixel_values"].cuda(),
|
98 |
+
max_new_tokens=1024,
|
99 |
+
num_beams=3
|
100 |
+
)
|
101 |
+
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
|
102 |
+
parsed_answer = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height))
|
103 |
+
|
104 |
+
del model
|
105 |
+
del processor
|
106 |
+
|
107 |
+
return parsed_answer
|
108 |
+
```
|
109 |
+
|
110 |
+
### Applications
|
111 |
+
This model is ideal for automating the extraction of key information from invoices in various business and financial applications. It can significantly reduce the manual effort required for data entry and validation in accounting and bookkeeping processes.
|
112 |
|
113 |
## Evaluation
|
114 |
+
The model has been evaluated on a held-out set of annotated invoice images. The evaluation metrics used included precision, recall, and F1-score for each of the identified fields. Detailed evaluation results and visualizations are available in the `results` directory of the repository.
|
115 |
|
116 |
+
## Limitations
|
117 |
+
- The model's performance is dependent on the quality and variability of the training dataset. It may not perform as well on invoices that significantly differ from those seen during training.
|
118 |
+
- Fine-tuning was conducted with specific LoRa configurations, which may need to be adjusted for different use cases or datasets.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
119 |
|
120 |
+
## Contact
|
121 |
+
For more information or questions about this model, please contact the developers at [your-email@example.com].
|
122 |
|
123 |
+
## License
|
124 |
+
This model is licensed under the MIT License. See the `LICENSE` file for more details.
|
125 |
|
126 |
+
## Acknowledgments
|
127 |
+
We would like to thank Microsoft for the development of the Florence2 vision model and the broader machine learning community for their contributions and support.
|
128 |
|
|