vkuropiatnyk commited on
Commit
82d5545
·
verified ·
1 Parent(s): 493c102

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +264 -0
  2. config.json +145 -0
  3. model.onnx +3 -0
  4. model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,264 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - google/efficientnet-b0
5
+ datasets:
6
+ - docling-project/HF-CC-v0-00001-00010-images-filtered-new-class
7
+ tags:
8
+ - image-classification
9
+ - document-analysis
10
+ - figure-classification
11
+ ---
12
+
13
+
14
+ # EfficientNet-B0 Document Figure Classifier v2.5
15
+
16
+ This is an image classification model based on **Google EfficientNet-B0**, fine-tuned on a subset of the [subset of HuggingFace/finepdfs](https://huggingface.co/datasets/docling-project/HF-CC-v0-00001-00010-images-filtered-new-class) to classify document figures into one of the following 26 categories:
17
+
18
+ 1. **logo**
19
+ 2. **photograph**
20
+ 3. **icon**
21
+ 4. **engineering_drawing**
22
+ 5. **line_chart**
23
+ 6. **bar_chart**
24
+ 7. **other**
25
+ 8. **table**
26
+ 9. **flow_chart**
27
+ 10. **screenshot_from_computer**
28
+ 11. **signature**
29
+ 12. **screenshot_from_manual**
30
+ 13. **geographical_map**
31
+ 14. **pie_chart**
32
+ 15. **page_thumbnail**
33
+ 16. **stamp**
34
+ 17. **music**
35
+ 18. **calendar**
36
+ 19. **qr_code**
37
+ 20. **bar_code**
38
+ 21. **full_page_image**
39
+ 22. **scatter_plot**
40
+ 23. **chemistry_structure**
41
+ 24. **topographical_map**
42
+ 25. **crossword_puzzle**
43
+ 26. **box_plot**
44
+
45
+
46
+ ## Model Performance
47
+
48
+ The model was evaluated on a held-out test set from the finepdfs dataset with the following metrics:
49
+
50
+ | Metric | Score |
51
+ |--------|-------|
52
+ | **Accuracy** | 0.90703 |
53
+ | **Balanced Accuracy** | 0.68836 |
54
+ | **Macro F1** | 0.68942 |
55
+ | **Weighted F1** | 0.90716 |
56
+ | **Cohen's Kappa** | 0.87449 |
57
+
58
+ ### Per-Label Performance
59
+
60
+ | Label | Precision | Recall |
61
+ |-------|-----------|--------|
62
+ | **logo** | 0.92807 | 0.91816 |
63
+ | **photograph** | 0.90966 | 0.96029 |
64
+ | **icon** | 0.83605 | 0.82678 |
65
+ | **engineering_drawing** | 0.71689 | 0.81172 |
66
+ | **line_chart** | 0.73055 | 0.92117 |
67
+ | **bar_chart** | 0.88599 | 0.92720 |
68
+ | **other** | 0.41893 | 0.38213 |
69
+ | **table** | 0.98636 | 0.96765 |
70
+ | **flow_chart** | 0.75926 | 0.82425 |
71
+ | **screenshot_from_computer** | 0.85952 | 0.71980 |
72
+ | **signature** | 0.89020 | 0.85971 |
73
+ | **screenshot_from_manual** | 0.48559 | 0.34543 |
74
+ | **geographical_map** | 0.86780 | 0.85219 |
75
+ | **pie_chart** | 0.96880 | 0.94220 |
76
+ | **page_thumbnail** | 0.52008 | 0.35188 |
77
+ | **stamp** | 0.71269 | 0.41794 |
78
+ | **music** | 0.48037 | 0.57778 |
79
+ | **calendar** | 0.52880 | 0.28775 |
80
+ | **qr_code** | 0.95694 | 0.93240 |
81
+ | **bar_code** | 0.34244 | 0.84305 |
82
+ | **full_page_image** | 0.40323 | 0.65789 |
83
+ | **scatter_plot** | 0.66848 | 0.67213 |
84
+ | **chemistry_structure** | 0.72781 | 0.65426 |
85
+ | **topographical_map** | 0.83333 | 0.38462 |
86
+ | **crossword_puzzle** | 0.57143 | 0.21622 |
87
+ | **box_plot** | 0.85714 | 0.64286 |
88
+
89
+
90
+ ## How to use - Transformers
91
+
92
+ Example of how to classify an image into one of the 26 classes using transformers:
93
+
94
+ ```python
95
+ import torch
96
+ import torchvision.transforms as transforms
97
+
98
+ from transformers import EfficientNetForImageClassification
99
+ from PIL import Image
100
+ import requests
101
+
102
+
103
+ urls = [
104
+ 'http://images.cocodataset.org/val2017/000000039769.jpg',
105
+ 'http://images.cocodataset.org/test-stuff2017/000000001750.jpg',
106
+ 'http://images.cocodataset.org/test-stuff2017/000000000001.jpg'
107
+ ]
108
+
109
+ image_processor = transforms.Compose(
110
+ [
111
+ transforms.Resize((224, 224)),
112
+ transforms.ToTensor(),
113
+ transforms.Normalize(
114
+ mean=[0.485, 0.456, 0.406],
115
+ std=[0.47853944, 0.4732864, 0.47434163],
116
+ ),
117
+ ]
118
+ )
119
+
120
+ images = []
121
+ for url in urls:
122
+ image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
123
+ image = image_processor(image)
124
+ images.append(image)
125
+
126
+
127
+ model_id = 'docling-project/DocumentFigureClassifier-v2.5'
128
+
129
+ model = EfficientNetForImageClassification.from_pretrained(model_id)
130
+
131
+ labels = model.config.id2label
132
+
133
+ device = torch.device("cpu")
134
+
135
+ torch_images = torch.stack(images).to(device)
136
+
137
+ with torch.no_grad():
138
+ logits = model(torch_images).logits # (batch_size, num_classes)
139
+ probs_batch = logits.softmax(dim=1) # (batch_size, num_classes)
140
+ probs_batch = probs_batch.cpu().numpy().tolist()
141
+
142
+ for idx, probs_image in enumerate(probs_batch):
143
+ preds = [(labels[i], prob) for i, prob in enumerate(probs_image)]
144
+ preds.sort(key=lambda t: t[1], reverse=True)
145
+ print(f"{idx}: {preds}")
146
+ ```
147
+
148
+
149
+ ## How to use - ONNX
150
+
151
+ Example of how to classify an image into one of the 26 classes using onnx runtime:
152
+
153
+ ```python
154
+ import onnxruntime
155
+
156
+ import numpy as np
157
+ import torchvision.transforms as transforms
158
+
159
+ from PIL import Image
160
+ import requests
161
+
162
+ LABELS = [
163
+ "logo",
164
+ "photograph",
165
+ "icon",
166
+ "engineering_drawing",
167
+ "line_chart",
168
+ "bar_chart",
169
+ "other",
170
+ "table",
171
+ "flow_chart",
172
+ "screenshot_from_computer",
173
+ "signature",
174
+ "screenshot_from_manual",
175
+ "geographical_map",
176
+ "pie_chart",
177
+ "page_thumbnail",
178
+ "stamp",
179
+ "music",
180
+ "calendar",
181
+ "qr_code",
182
+ "bar_code",
183
+ "full_page_image",
184
+ "scatter_plot",
185
+ "chemistry_structure",
186
+ "topographical_map",
187
+ "crossword_puzzle",
188
+ "box_plot"
189
+ ]
190
+
191
+
192
+ urls = [
193
+ 'http://images.cocodataset.org/val2017/000000039769.jpg',
194
+ 'http://images.cocodataset.org/test-stuff2017/000000001750.jpg',
195
+ 'http://images.cocodataset.org/test-stuff2017/000000000001.jpg'
196
+ ]
197
+
198
+ images = []
199
+ for url in urls:
200
+ image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
201
+ images.append(image)
202
+
203
+
204
+ image_processor = transforms.Compose(
205
+ [
206
+ transforms.Resize((224, 224)),
207
+ transforms.ToTensor(),
208
+ transforms.Normalize(
209
+ mean=[0.485, 0.456, 0.406],
210
+ std=[0.47853944, 0.4732864, 0.47434163],
211
+ ),
212
+ ]
213
+ )
214
+
215
+
216
+ processed_images_onnx = [image_processor(image).unsqueeze(0) for image in images]
217
+
218
+ # onnx needs numpy as input
219
+ onnx_inputs = [item.numpy(force=True) for item in processed_images_onnx]
220
+
221
+ # pack into a batch
222
+ onnx_inputs = np.concatenate(onnx_inputs, axis=0)
223
+
224
+ ort_session = onnxruntime.InferenceSession(
225
+ "./DocumentFigureClassifier-v2_5-onnx/model.onnx",
226
+ providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
227
+ )
228
+
229
+
230
+ for item in ort_session.run(None, {'input': onnx_inputs}):
231
+ for x in iter(item):
232
+ pred = x.argmax()
233
+ print(LABELS[pred])
234
+ ```
235
+
236
+
237
+ ## Training Data
238
+
239
+ This model was trained on a subset of the [subset of HuggingFace/finepdfs](https://huggingface.co/datasets/docling-project/HF-CC-v0-00001-00010-images-filtered-new-class), a large-scale dataset for document understanding tasks.
240
+
241
+
242
+ ## Citation
243
+
244
+ If you use this model in your work, please cite the following papers:
245
+
246
+ ```
247
+ @article{Tan2019EfficientNetRM,
248
+ title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
249
+ author={Mingxing Tan and Quoc V. Le},
250
+ journal={ArXiv},
251
+ year={2019},
252
+ volume={abs/1905.11946}
253
+ }
254
+
255
+ @techreport{Docling,
256
+ author = {Deep Search Team},
257
+ month = {8},
258
+ title = {{Docling Technical Report}},
259
+ url={https://arxiv.org/abs/2408.09869},
260
+ eprint={2408.09869},
261
+ doi = "10.48550/arXiv.2408.09869",
262
+ version = {1.0.0},
263
+ year = {2024}
264
+ }
config.json ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "EfficientNetForImageClassification"
4
+ ],
5
+ "batch_norm_eps": 0.001,
6
+ "batch_norm_momentum": 0.99,
7
+ "depth_coefficient": 1.0,
8
+ "depth_divisor": 8,
9
+ "depthwise_padding": [],
10
+ "drop_connect_rate": 0.2,
11
+ "dropout_rate": 0.2,
12
+ "dtype": "float32",
13
+ "expand_ratios": [
14
+ 1,
15
+ 6,
16
+ 6,
17
+ 6,
18
+ 6,
19
+ 6,
20
+ 6
21
+ ],
22
+ "hidden_act": "swish",
23
+ "hidden_dim": 1280,
24
+ "id2label": {
25
+ "0": "logo",
26
+ "1": "photograph",
27
+ "10": "signature",
28
+ "11": "screenshot_from_manual",
29
+ "12": "geographical_map",
30
+ "13": "pie_chart",
31
+ "14": "page_thumbnail",
32
+ "15": "stamp",
33
+ "16": "music",
34
+ "17": "calendar",
35
+ "18": "qr_code",
36
+ "19": "bar_code",
37
+ "2": "icon",
38
+ "20": "full_page_image",
39
+ "21": "scatter_plot",
40
+ "22": "chemistry_structure",
41
+ "23": "topographical_map",
42
+ "24": "crossword_puzzle",
43
+ "25": "box_plot",
44
+ "3": "engineering_drawing",
45
+ "4": "line_chart",
46
+ "5": "bar_chart",
47
+ "6": "other",
48
+ "7": "table",
49
+ "8": "flow_chart",
50
+ "9": "screenshot_from_computer"
51
+ },
52
+ "image_size": 224,
53
+ "in_channels": [
54
+ 32,
55
+ 16,
56
+ 24,
57
+ 40,
58
+ 80,
59
+ 112,
60
+ 192
61
+ ],
62
+ "initializer_range": 0.02,
63
+ "kernel_sizes": [
64
+ 3,
65
+ 3,
66
+ 5,
67
+ 3,
68
+ 5,
69
+ 5,
70
+ 3
71
+ ],
72
+ "label2id": {
73
+ "bar_chart": "5",
74
+ "bar_code": "19",
75
+ "box_plot": "25",
76
+ "calendar": "17",
77
+ "chemistry_structure": "22",
78
+ "crossword_puzzle": "24",
79
+ "engineering_drawing": "3",
80
+ "flow_chart": "8",
81
+ "full_page_image": "20",
82
+ "geographical_map": "12",
83
+ "icon": "2",
84
+ "line_chart": "4",
85
+ "logo": "0",
86
+ "music": "16",
87
+ "other": "6",
88
+ "page_thumbnail": "14",
89
+ "photograph": "1",
90
+ "pie_chart": "13",
91
+ "qr_code": "18",
92
+ "scatter_plot": "21",
93
+ "screenshot_from_computer": "9",
94
+ "screenshot_from_manual": "11",
95
+ "signature": "10",
96
+ "stamp": "15",
97
+ "table": "7",
98
+ "topographical_map": "23"
99
+ },
100
+ "model_type": "efficientnet",
101
+ "num_block_repeats": [
102
+ 1,
103
+ 2,
104
+ 2,
105
+ 3,
106
+ 3,
107
+ 4,
108
+ 1
109
+ ],
110
+ "num_channels": 3,
111
+ "num_hidden_layers": 64,
112
+ "out_channels": [
113
+ 16,
114
+ 24,
115
+ 40,
116
+ 80,
117
+ 112,
118
+ 192,
119
+ 320
120
+ ],
121
+ "out_features": null,
122
+ "pooling_type": "mean",
123
+ "squeeze_expansion_ratio": 0.25,
124
+ "stage_names": [
125
+ "stem",
126
+ "stage1",
127
+ "stage2",
128
+ "stage3",
129
+ "stage4",
130
+ "stage5",
131
+ "stage6",
132
+ "stage7"
133
+ ],
134
+ "strides": [
135
+ 1,
136
+ 2,
137
+ 2,
138
+ 2,
139
+ 1,
140
+ 2,
141
+ 1
142
+ ],
143
+ "transformers_version": "4.57.3",
144
+ "width_coefficient": 1.0
145
+ }
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27ffc48c27ae4e12c99b6f6de0dd730005245e47b70dd0c1339e62cbac3ec4c0
3
+ size 16940439
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6bf1e44d6bce316dcade6eb9929d8f8d23b6e8d9d29062b3b4011cff87c7c3cd
3
+ size 16378200