jjmcarrascosa commited on
Commit
2a70f18
1 Parent(s): 6a3f653

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -6
README.md CHANGED
@@ -10,9 +10,6 @@ model-index:
10
  results: []
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
-
16
  # vit_tickers_binaryclf
17
 
18
  This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the cord dataset.
@@ -22,18 +19,28 @@ It achieves the following results on the evaluation set:
22
 
23
  ## Model description
24
 
25
- More information needed
26
 
27
  ## Intended uses & limitations
28
 
29
- More information needed
30
 
31
  ## Training and evaluation data
32
 
33
- More information needed
 
 
 
 
 
 
34
 
35
  ## Training procedure
36
 
 
 
 
 
37
  ### Training hyperparameters
38
 
39
  The following hyperparameters were used during training:
 
10
  results: []
11
  ---
12
 
 
 
 
13
  # vit_tickers_binaryclf
14
 
15
  This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the cord dataset.
 
19
 
20
  ## Model description
21
 
22
+ This model is a Binary Classifier finetuned version of ViT, to predict if an input image is a picture / scan of ticket(s) o something else.
23
 
24
  ## Intended uses & limitations
25
 
26
+ Use this model to classify your images into tickets or not tickers. WIth the tickets group, you can use Multimodal Information Extraction, as Visual Named Entity Recognition, to extract the ticket items, amounts, total, etc. Check the Cord dataset for more information.
27
 
28
  ## Training and evaluation data
29
 
30
+ This model used 2 datasets as positive class (`ticket`):
31
+ - `cord`
32
+ - `https://expressexpense.com/blog/free-receipt-images-ocr-machine-learning-dataset/`
33
+
34
+ For the negative class (`no_ticket`), the following datasets were used:
35
+ - A subset of `RVL-CDIP`
36
+ - A subset of `visual-genome`
37
 
38
  ## Training procedure
39
 
40
+ Datasets were loaded with different distributions of data for positive and negative classes. Then, normalization and resizing is carried out to adapt it to ViT expected input.
41
+
42
+ Different runs were carried out changing the data distribution and the hyperparameters to maximize F1.
43
+
44
  ### Training hyperparameters
45
 
46
  The following hyperparameters were used during training: