Qing Jiang commited on
Commit
47325f8
1 Parent(s): d130b8d

First model version

Browse files
.gitattributes CHANGED
@@ -32,3 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
+ *.png filter=lfs diff=lfs merge=lfs -text
Dockerfile ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /code
4
+
5
+ COPY requirements.txt ./
6
+
7
+ RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
8
+
9
+ RUN useradd -m -u 1000 user
10
+
11
+ USER user
12
+
13
+ ENV HOME=/home/user \
14
+ PATH=/home/user/.local/bin:$PATH
15
+
16
+ WORKDIR $HOME/app
17
+
18
+ COPY --chown=user . $HOME/app
19
+
20
+ COPY --chown=user config/config.toml $HOME/app/.streamlit/config.toml
21
+
22
+ CMD ["streamlit", "run", "main.py", "--server.port=7860", "--server.address=0.0.0.0"]
README.md CHANGED
@@ -1,13 +1,84 @@
1
- ---
2
- title: Sparrow
3
- emoji: 🐢
4
- colorFrom: green
5
- colorTo: yellow
6
- sdk: streamlit
7
- sdk_version: 1.15.2
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Sparrow UI
2
+
3
+ ## Description
4
+
5
+ [Sparrow UI](https://katanaml-org-sparrow-ui.hf.space) module implements UI logic with Streamlit for document data annotation, model training/tuning and document data extraction.
6
+
7
+ #### Dashboard UI:
8
+
9
+ ![Sparrow Dashboard](https://github.com/katanaml/sparrow/blob/main/sparrow-ui/assets/dashboard.png)
10
+
11
+ #### Annotation UI:
12
+
13
+ ![Sparrow Annotation](https://github.com/katanaml/sparrow/blob/main/sparrow-ui/assets/annotation.png)
14
+
15
+ ## Instructions
16
+
17
+ 1. Install
18
+
19
+ Streamlit docs:
20
+ https://docs.streamlit.io/library/get-started/installation
21
+
22
+ ```
23
+ pip install -r requirements.txt
24
+ ```
25
+
26
+ 2. Run
27
+
28
+ ```
29
+ streamlit run main.py
30
+ ```
31
+
32
+ ## Run in Docker container
33
+
34
+ 1. Build Docker image
35
+
36
+ ```
37
+ docker build --tag katanaml/sparrow-ui .
38
+ ```
39
+
40
+ 2. Run Docker container
41
+
42
+ ```
43
+ docker run -it --name sparrow-ui -p 7860:7860 katanaml/sparrow-ui:latest
44
+ ```
45
+
46
+ ## Deploy to Hugging Face Spaces
47
+
48
+ 1. Create new Space - https://huggingface.co/spaces
49
+
50
+ 2. Clone Space repo and init Git LFS. Copy Sparrow UI files. We are using config.toml from config folder, when deploying Docker container on Hugging Face Spaces, it can't read from standard .streamlit folder
51
+
52
+ ```
53
+ git lfs install
54
+ ```
55
+
56
+ 3. Add these files to LFS config
57
+
58
+ ```
59
+ git lfs track "assets/ab.png"
60
+ git lfs track "docs/image/receipt_00001.png"
61
+ git lfs track "docs/image/receipt_00002.png"
62
+ git lfs track "docs/image/receipt_00003.png"
63
+ git lfs track "docs/image/w_invoice1.png"
64
+ ```
65
+
66
+ 4. Commit and push code to Hugging Face Space, follow Space instructions. Docker container will be deployed automatically. Space example:
67
+
68
+ ```
69
+ https://huggingface.co/spaces/katanaml-org/sparrow-ui
70
+ ```
71
+
72
+ 5. Sparrow UI will be accessible by URL, you can get it from Hugging Face Space info. For example:
73
+
74
+ ```
75
+ https://katanaml-org-sparrow-ui.hf.space
76
+ ```
77
+
78
+ ## Author
79
+
80
+ [Katana ML](https://katanaml.io), [Andrej Baranovskij](https://github.com/abaranovskis-redsamurai)
81
+
82
+ ## License
83
+
84
+ Licensed under the Apache License, Version 2.0. Copyright 2020-2022 Katana ML, Andrej Baranovskij. [Copy of the license](https://github.com/katanaml/sparrow/blob/main/LICENSE).
assets/ab.png ADDED

Git LFS Details

  • SHA256: 7a125de550ed32d1985e2bc1ff76b3cbd60d696b62a07376d0c661ef027d7eea
  • Pointer size: 132 Bytes
  • Size of remote file: 1.03 MB
assets/annotation.png ADDED

Git LFS Details

  • SHA256: 7c7a9bd4c2d7ca30449b301fa6b2ed24704ead242b72b25ef377b2c192d097a5
  • Pointer size: 131 Bytes
  • Size of remote file: 734 kB
assets/dashboard.png ADDED

Git LFS Details

  • SHA256: 45ee912d2e07a90c2fcde6bea7b1dcc2c8a3d531363b3ee2b1db7b7bc287d93f
  • Pointer size: 131 Bytes
  • Size of remote file: 780 kB
config/config.toml ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Below are all the sections and options you can have in ~/.streamlit/config.toml.
2
+
3
+ [global]
4
+
5
+ # By default, Streamlit checks if the Python watchdog module is available and, if not, prints a warning asking for you to install it. The watchdog module is not required, but highly recommended. It improves Streamlit's ability to detect changes to files in your filesystem.
6
+ # If you'd like to turn off this warning, set this to True.
7
+ # Default: false
8
+ disableWatchdogWarning = true
9
+
10
+ # If True, will show a warning when you run a Streamlit-enabled script via "python my_script.py".
11
+ # Default: true
12
+ # showWarningOnDirectExecution = true
13
+
14
+ # DataFrame serialization.
15
+ # Acceptable values: - 'legacy': Serialize DataFrames using Streamlit's custom format. Slow but battle-tested. - 'arrow': Serialize DataFrames using Apache Arrow. Much faster and versatile.
16
+ # Default: "arrow"
17
+ # dataFrameSerialization = "arrow"
18
+
19
+
20
+ [logger]
21
+
22
+ # Level of logging: 'error', 'warning', 'info', or 'debug'.
23
+ # Default: 'info'
24
+ # level = "info"
25
+
26
+ # String format for logging messages. If logger.datetimeFormat is set, logger messages will default to `%(asctime)s.%(msecs)03d %(message)s`. See [Python's documentation](https://docs.python.org/2.6/library/logging.html#formatter-objects) for available attributes.
27
+ # Default: "%(asctime)s %(message)s"
28
+ # messageFormat = "%(asctime)s %(message)s"
29
+
30
+
31
+ [client]
32
+
33
+ # Whether to enable st.cache.
34
+ # Default: true
35
+ # caching = true
36
+
37
+ # If false, makes your Streamlit script not draw to a Streamlit app.
38
+ # Default: true
39
+ # displayEnabled = true
40
+
41
+ # Controls whether uncaught app exceptions are displayed in the browser. By default, this is set to True and Streamlit displays app exceptions and associated tracebacks in the browser.
42
+ # If set to False, an exception will result in a generic message being shown in the browser, and exceptions and tracebacks will be printed to the console only.
43
+ # Default: true
44
+ # showErrorDetails = true
45
+
46
+
47
+ [runner]
48
+
49
+ # Allows you to type a variable or string by itself in a single line of Python code to write it to the app.
50
+ # Default: true
51
+ # magicEnabled = true
52
+
53
+ # Install a Python tracer to allow you to stop or pause your script at any point and introspect it. As a side-effect, this slows down your script's execution.
54
+ # Default: false
55
+ # installTracer = false
56
+
57
+ # Sets the MPLBACKEND environment variable to Agg inside Streamlit to prevent Python crashing.
58
+ # Default: true
59
+ # fixMatplotlib = true
60
+
61
+ # Run the Python Garbage Collector after each script execution. This can help avoid excess memory use in Streamlit apps, but could introduce delay in rerunning the app script for high-memory-use applications.
62
+ # Default: true
63
+ # postScriptGC = true
64
+
65
+ # Handle script rerun requests immediately, rather than waiting for script execution to reach a yield point. Enabling this will make Streamlit much more responsive to user interaction, but it can lead to race conditions in apps that mutate session_state data outside of explicit session_state assignment statements.
66
+ # Default: false
67
+ fastReruns = true
68
+
69
+
70
+ [server]
71
+
72
+ # List of folders that should not be watched for changes. This impacts both "Run on Save" and @st.cache.
73
+ # Relative paths will be taken as relative to the current working directory.
74
+ # Example: ['/home/user1/env', 'relative/path/to/folder']
75
+ # Default: []
76
+ # folderWatchBlacklist = []
77
+
78
+ # Change the type of file watcher used by Streamlit, or turn it off completely.
79
+ # Allowed values: * "auto" : Streamlit will attempt to use the watchdog module, and falls back to polling if watchdog is not available. * "watchdog" : Force Streamlit to use the watchdog module. * "poll" : Force Streamlit to always use polling. * "none" : Streamlit will not watch files.
80
+ # Default: "auto"
81
+ fileWatcherType = "auto"
82
+
83
+ # Symmetric key used to produce signed cookies. If deploying on multiple replicas, this should be set to the same value across all replicas to ensure they all share the same secret.
84
+ # Default: randomly generated secret key.
85
+ # cookieSecret = "80f9eb91f1eb64e26f0e46148556bf493ccde5fe27712bbcbebf57948852b5fe"
86
+
87
+ # If false, will attempt to open a browser window on start.
88
+ # Default: false unless (1) we are on a Linux box where DISPLAY is unset, or (2) we are running in the Streamlit Atom plugin.
89
+ # headless = false
90
+
91
+ # Automatically rerun script when the file is modified on disk.
92
+ # Default: false
93
+ # runOnSave = false
94
+
95
+ # The address where the server will listen for client and browser connections. Use this if you want to bind the server to a specific address. If set, the server will only be accessible from this address, and not from any aliases (like localhost).
96
+ # Default: (unset)
97
+ # address =
98
+
99
+ # The port where the server will listen for browser connections.
100
+ # Default: 8501
101
+ port = 7860
102
+
103
+ # The base path for the URL where Streamlit should be served from.
104
+ # Default: ""
105
+ # baseUrlPath = ""
106
+
107
+ # Enables support for Cross-Origin Request Sharing (CORS) protection, for added security.
108
+ # Due to conflicts between CORS and XSRF, if `server.enableXsrfProtection` is on and `server.enableCORS` is off at the same time, we will prioritize `server.enableXsrfProtection`.
109
+ # Default: true
110
+ # enableCORS = true
111
+
112
+ # Enables support for Cross-Site Request Forgery (XSRF) protection, for added security.
113
+ # Due to conflicts between CORS and XSRF, if `server.enableXsrfProtection` is on and `server.enableCORS` is off at the same time, we will prioritize `server.enableXsrfProtection`.
114
+ # Default: true
115
+ # enableXsrfProtection = true
116
+
117
+ # Max size, in megabytes, for files uploaded with the file_uploader.
118
+ # Default: 200
119
+ maxUploadSize = 10
120
+
121
+ # Max size, in megabytes, of messages that can be sent via the WebSocket connection.
122
+ # Default: 200
123
+ # maxMessageSize = 200
124
+
125
+ # Enables support for websocket compression.
126
+ # Default: false
127
+ # enableWebsocketCompression = false
128
+
129
+
130
+ [browser]
131
+
132
+ # Internet address where users should point their browsers in order to connect to the app. Can be IP address or DNS name and path.
133
+ # This is used to: - Set the correct URL for CORS and XSRF protection purposes. - Show the URL on the terminal - Open the browser
134
+ # Default: "localhost"
135
+ # serverAddress = "localhost"
136
+
137
+ # Whether to send usage statistics to Streamlit.
138
+ # Default: true
139
+ gatherUsageStats = false
140
+
141
+ # Port where users should point their browsers in order to connect to the app.
142
+ # This is used to: - Set the correct URL for CORS and XSRF protection purposes. - Show the URL on the terminal - Open the browser
143
+ # Default: whatever value is set in server.port.
144
+ # serverPort = 8501
145
+
146
+
147
+ [mapbox]
148
+
149
+ # Configure Streamlit to use a custom Mapbox token for elements like st.pydeck_chart and st.map. To get a token for yourself, create an account at https://mapbox.com. It's free (for moderate usage levels)!
150
+ # Default: ""
151
+ # token = ""
152
+
153
+
154
+ [deprecation]
155
+
156
+ # Set to false to disable the deprecation warning for the file uploader encoding.
157
+ # Default: true
158
+ # showfileUploaderEncoding = true
159
+
160
+ # Set to false to disable the deprecation warning for using the global pyplot instance.
161
+ # Default: true
162
+ # showPyplotGlobalUse = true
163
+
164
+
165
+ [theme]
166
+
167
+ # The preset Streamlit theme that your custom theme inherits from. One of "light" or "dark".
168
+ # base =
169
+
170
+ # Primary accent color for interactive elements.
171
+ # primaryColor = "#F633"
172
+
173
+ # Background color for the main content area.
174
+ backgroundColor = "#FFFFFF"
175
+
176
+ # Background color used for the sidebar and most interactive widgets.
177
+ secondaryBackgroundColor = "#F0F2F6"
178
+
179
+ # Color used for almost all text.
180
+ textColor = "#262730"
181
+
182
+ # Font family for all text in the app, except code blocks. One of "sans serif", "serif", or "monospace".
183
+ font = "sans serif"
docs/image/receipt_00001.png ADDED

Git LFS Details

  • SHA256: a43a430908190f38a8e4145298ec09f60a112906ab4f33840894ff5700a758bc
  • Pointer size: 132 Bytes
  • Size of remote file: 1.78 MB
docs/image/receipt_00002.png ADDED

Git LFS Details

  • SHA256: 9357a493f210849c518ab32de56de1f056f2b9f836ad1dd1970588cde9e353cb
  • Pointer size: 131 Bytes
  • Size of remote file: 462 kB
docs/image/receipt_00003.png ADDED

Git LFS Details

  • SHA256: 91a6b599b27f4049cc4e56997ffce29da8efe9b4e30038d2a500fa068b4eb781
  • Pointer size: 132 Bytes
  • Size of remote file: 1.93 MB
docs/image/w_invoice1.png ADDED

Git LFS Details

  • SHA256: bfad3fd9e1bf842d68c356d88778429e3b41e98e8e9612ca43a4708d99edf93e
  • Pointer size: 131 Bytes
  • Size of remote file: 108 kB
docs/json/receipt_00001.json ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "meta": {
3
+ "version": "v0.1",
4
+ "split": "train",
5
+ "image_id": 1,
6
+ "image_size": {
7
+ "width": 864,
8
+ "height": 1296
9
+ }
10
+ },
11
+ "words": [
12
+ {
13
+ "rect": {
14
+ "x1": 503,
15
+ "y1": 475,
16
+ "x2": 637,
17
+ "y2": 516
18
+ },
19
+ "value": "58,000",
20
+ "label": "item_price"
21
+ },
22
+ {
23
+ "rect": {
24
+ "x1": 498,
25
+ "y1": 520,
26
+ "x2": 638,
27
+ "y2": 561
28
+ },
29
+ "value": "165,000",
30
+ "label": "item_price"
31
+ },
32
+ {
33
+ "rect": {
34
+ "x1": 496,
35
+ "y1": 607,
36
+ "x2": 640,
37
+ "y2": 650
38
+ },
39
+ "value": "195,000",
40
+ "label": "item_price"
41
+ },
42
+ {
43
+ "rect": {
44
+ "x1": 518,
45
+ "y1": 696,
46
+ "x2": 650,
47
+ "y2": 738
48
+ },
49
+ "value": "22,000",
50
+ "label": "item_price"
51
+ },
52
+ {
53
+ "rect": {
54
+ "x1": 520,
55
+ "y1": 746,
56
+ "x2": 649,
57
+ "y2": 787
58
+ },
59
+ "value": "28,000",
60
+ "label": "item_price"
61
+ },
62
+ {
63
+ "rect": {
64
+ "x1": 522,
65
+ "y1": 792,
66
+ "x2": 652,
67
+ "y2": 835
68
+ },
69
+ "value": "35,000",
70
+ "label": "item_price"
71
+ },
72
+ {
73
+ "rect": {
74
+ "x1": 507,
75
+ "y1": 883,
76
+ "x2": 656,
77
+ "y2": 933
78
+ },
79
+ "value": "503,000",
80
+ "label": "subtotal"
81
+ },
82
+ {
83
+ "rect": {
84
+ "x1": 533,
85
+ "y1": 936,
86
+ "x2": 660,
87
+ "y2": 976
88
+ },
89
+ "value": "52,815",
90
+ "label": "tax"
91
+ },
92
+ {
93
+ "rect": {
94
+ "x1": 343,
95
+ "y1": 1022,
96
+ "x2": 641,
97
+ "y2": 1124
98
+ },
99
+ "value": "580,965",
100
+ "label": "total"
101
+ },
102
+ {
103
+ "rect": {
104
+ "x1": 111,
105
+ "y1": 472,
106
+ "x2": 464,
107
+ "y2": 515
108
+ },
109
+ "value": "SPGTHY BOLOGNASE",
110
+ "label": "item"
111
+ },
112
+ {
113
+ "rect": {
114
+ "x1": 108,
115
+ "y1": 521,
116
+ "x2": 350,
117
+ "y2": 567
118
+ },
119
+ "value": "PEPPER AUS",
120
+ "label": "item"
121
+ },
122
+ {
123
+ "rect": {
124
+ "x1": 109,
125
+ "y1": 610,
126
+ "x2": 387,
127
+ "y2": 652
128
+ },
129
+ "value": "WAGYU RIBEYE",
130
+ "label": "item"
131
+ },
132
+ {
133
+ "rect": {
134
+ "x1": 113,
135
+ "y1": 704,
136
+ "x2": 438,
137
+ "y2": 749
138
+ },
139
+ "value": "ICED LEMON TEA",
140
+ "label": "item"
141
+ },
142
+ {
143
+ "rect": {
144
+ "x1": 111,
145
+ "y1": 753,
146
+ "x2": 472,
147
+ "y2": 800
148
+ },
149
+ "value": "FUSION TEA LYCHE",
150
+ "label": "item"
151
+ },
152
+ {
153
+ "rect": {
154
+ "x1": 110,
155
+ "y1": 806,
156
+ "x2": 471,
157
+ "y2": 849
158
+ },
159
+ "value": "NUTTELA BROWNIES",
160
+ "label": "item"
161
+ }
162
+ ]
163
+ }
docs/json/receipt_00002.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "meta": {
3
+ "version": "v0.1",
4
+ "split": "train",
5
+ "image_id": 2,
6
+ "image_size": {
7
+ "width": 720,
8
+ "height": 1280
9
+ }
10
+ },
11
+ "words": [
12
+
13
+ ]
14
+ }
docs/json/receipt_00003.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "meta": {
3
+ "version": "v0.1",
4
+ "split": "train",
5
+ "image_id": 3,
6
+ "image_size": {
7
+ "width": 1108,
8
+ "height": 1478
9
+ }
10
+ },
11
+ "words": [
12
+ {
13
+ "rect": {
14
+ "x1": 712,
15
+ "y1": 755,
16
+ "x2": 846,
17
+ "y2": 797
18
+ },
19
+ "value": "59,000",
20
+ "label": "item_price"
21
+ },
22
+ {
23
+ "rect": {
24
+ "x1": 715,
25
+ "y1": 849,
26
+ "x2": 845,
27
+ "y2": 894
28
+ },
29
+ "value": "10,000",
30
+ "label": "item_price"
31
+ }
32
+ ]
33
+ }
docs/json/w_invoice1.json ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "meta": {
3
+ "version": "v0.1",
4
+ "split": "train",
5
+ "image_id": 5,
6
+ "image_size": {
7
+ "width": 830,
8
+ "height": 1080
9
+ }
10
+ },
11
+ "words": [
12
+ {
13
+ "rect": {
14
+ "x1": 681,
15
+ "y1": 274,
16
+ "x2": 761,
17
+ "y2": 298
18
+ },
19
+ "value": "$1,699.48",
20
+ "label": "amount_due"
21
+ },
22
+ {
23
+ "rect": {
24
+ "x1": 703,
25
+ "y1": 517,
26
+ "x2": 765,
27
+ "y2": 539
28
+ },
29
+ "value": "$550.00",
30
+ "label": "item_price"
31
+ },
32
+ {
33
+ "rect": {
34
+ "x1": 544,
35
+ "y1": 274,
36
+ "x2": 631,
37
+ "y2": 297
38
+ },
39
+ "value": "INV-10012",
40
+ "label": "invoice_number"
41
+ },
42
+ {
43
+ "rect": {
44
+ "x1": 691,
45
+ "y1": 880,
46
+ "x2": 762,
47
+ "y2": 907
48
+ },
49
+ "value": "$169.95",
50
+ "label": "deposit_due"
51
+ },
52
+ {
53
+ "rect": {
54
+ "x1": 694,
55
+ "y1": 582,
56
+ "x2": 763,
57
+ "y2": 607
58
+ },
59
+ "value": "$1,125.00",
60
+ "label": "item_price"
61
+ },
62
+ {
63
+ "rect": {
64
+ "x1": 701,
65
+ "y1": 650,
66
+ "x2": 767,
67
+ "y2": 673
68
+ },
69
+ "value": "$123.39",
70
+ "label": "item_price"
71
+ },
72
+ {
73
+ "rect": {
74
+ "x1": 681,
75
+ "y1": 718,
76
+ "x2": 761,
77
+ "y2": 748
78
+ },
79
+ "value": "$1,798.39",
80
+ "label": "subtotal"
81
+ }
82
+ ]
83
+ }
docs/visitors.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"meta": {"visitors": 8}}
favicon.ico ADDED
main.py ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ from streamlit_option_menu import option_menu
3
+ from tools.utilities import load_css
4
+ import json
5
+
6
+ from views.dashboard import Dashboard
7
+ from views.data_annotation import DataAnnotation
8
+ from views.model_training import ModelTraining
9
+ from views.model_tuning import ModelTuning
10
+ from views.data_extraction import DataExtraction
11
+ from views.settings import Settings
12
+ from views.about import About
13
+
14
+ import streamlit_javascript as st_js
15
+
16
+ st.set_page_config(
17
+ page_title="Sparrow",
18
+ page_icon="favicon.ico",
19
+ layout="wide"
20
+ )
21
+
22
+ load_css()
23
+
24
+
25
+ class Model:
26
+ menuTitle = "Sparrow"
27
+ option1 = "Dashboard"
28
+ option2 = "Data Annotation"
29
+ option3 = "Model Training"
30
+ option4 = "Model Tuning"
31
+ option5 = "Data Extraction"
32
+ option6 = "Settings"
33
+ option7 = "About"
34
+
35
+ menuIcon = "menu-up"
36
+ icon1 = "speedometer"
37
+ icon2 = "activity"
38
+ icon3 = "motherboard"
39
+ icon4 = "graph-up-arrow"
40
+ icon5 = "clipboard-data"
41
+ icon6 = "gear"
42
+ icon7 = "chat"
43
+
44
+
45
+ def view(model):
46
+ with st.sidebar:
47
+ menuItem = option_menu(model.menuTitle,
48
+ [model.option1, model.option2, model.option7],
49
+ icons=[model.icon1, model.icon2, model.icon7],
50
+ menu_icon=model.menuIcon,
51
+ default_index=0,
52
+ styles={
53
+ "container": {"padding": "5!important", "background-color": "#fafafa"},
54
+ "icon": {"color": "black", "font-size": "25px"},
55
+ "nav-link": {"font-size": "16px", "text-align": "left", "margin": "0px",
56
+ "--hover-color": "#eee"},
57
+ "nav-link-selected": {"background-color": "#037ffc"},
58
+ })
59
+
60
+ if menuItem == model.option1:
61
+ Dashboard().view(Dashboard.Model())
62
+ logout_widget()
63
+
64
+ if menuItem == model.option2:
65
+ if 'ui_width' not in st.session_state or 'device_type' not in st.session_state or 'device_width' not in st.session_state:
66
+ # Get UI width
67
+ ui_width = st_js.st_javascript("window.innerWidth", key="ui_width_comp")
68
+ device_width = st_js.st_javascript("window.screen.width", key="device_width_comp")
69
+
70
+ if ui_width > 0 and device_width > 0:
71
+ # Add 20% of current screen width to compensate for the sidebar
72
+ ui_width = round(ui_width + (20 * ui_width / 100))
73
+
74
+ if device_width > 768:
75
+ device_type = 'desktop'
76
+ else:
77
+ device_type = 'mobile'
78
+
79
+ st.session_state['ui_width'] = ui_width
80
+ st.session_state['device_type'] = device_type
81
+ st.session_state['device_width'] = device_width
82
+
83
+ st.experimental_rerun()
84
+ else:
85
+ DataAnnotation().view(DataAnnotation.Model(), st.session_state['ui_width'], st.session_state['device_type'],
86
+ st.session_state['device_width'])
87
+ logout_widget()
88
+
89
+ if menuItem == model.option3:
90
+ ModelTraining().view(ModelTraining.Model())
91
+ logout_widget()
92
+
93
+ if menuItem == model.option4:
94
+ ModelTuning().view(ModelTuning.Model())
95
+ logout_widget()
96
+
97
+ if menuItem == model.option5:
98
+ DataExtraction().view(DataExtraction.Model())
99
+ logout_widget()
100
+
101
+ if menuItem == model.option6:
102
+ Settings().view(Settings.Model())
103
+ logout_widget()
104
+
105
+ if menuItem == model.option7:
106
+ About().view(About.Model())
107
+ logout_widget()
108
+
109
+
110
+ def logout_widget():
111
+ with st.sidebar:
112
+ st.markdown("---")
113
+ # st.write("User:", "John Doe")
114
+ st.write("Version:", "0.0.1")
115
+ # st.button("Logout")
116
+ # st.markdown("---")
117
+
118
+ if 'visitors' not in st.session_state:
119
+ with open("docs/visitors.json", "r") as f:
120
+ visitors_json = json.load(f)
121
+ visitors = visitors_json["meta"]["visitors"]
122
+
123
+ visitors += 1
124
+ visitors_json["meta"]["visitors"] = visitors
125
+
126
+ with open("docs/visitors.json", "w") as f:
127
+ json.dump(visitors_json, f)
128
+
129
+ st.session_state['visitors'] = visitors
130
+ else:
131
+ visitors = st.session_state['visitors']
132
+
133
+ st.write("Counter:", visitors)
134
+
135
+
136
+ view(Model())
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ streamlit==1.16.0
2
+ streamlit_option_menu
3
+ streamlit-sparrow-labeling==0.1.1
4
+ streamlit_nested_layout
5
+ streamlit-javascript
tools/st_functions.py ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+
4
+ def st_button(icon, url, label, iconsize):
5
+ if icon == 'youtube':
6
+ button_code = f'''
7
+ <p>
8
+ <a href={url} class="btn btn-outline-primary btn-lg btn-block" type="button" aria-pressed="true">
9
+ <svg xmlns="http://www.w3.org/2000/svg" width={iconsize} height={iconsize} fill="currentColor" class="bi bi-youtube" viewBox="0 0 16 16">
10
+ <path d="M8.051 1.999h.089c.822.003 4.987.033 6.11.335a2.01 2.01 0 0 1 1.415 1.42c.101.38.172.883.22 1.402l.01.104.022.26.008.104c.065.914.073 1.77.074 1.957v.075c-.001.194-.01 1.108-.082 2.06l-.008.105-.009.104c-.05.572-.124 1.14-.235 1.558a2.007 2.007 0 0 1-1.415 1.42c-1.16.312-5.569.334-6.18.335h-.142c-.309 0-1.587-.006-2.927-.052l-.17-.006-.087-.004-.171-.007-.171-.007c-1.11-.049-2.167-.128-2.654-.26a2.007 2.007 0 0 1-1.415-1.419c-.111-.417-.185-.986-.235-1.558L.09 9.82l-.008-.104A31.4 31.4 0 0 1 0 7.68v-.123c.002-.215.01-.958.064-1.778l.007-.103.003-.052.008-.104.022-.26.01-.104c.048-.519.119-1.023.22-1.402a2.007 2.007 0 0 1 1.415-1.42c.487-.13 1.544-.21 2.654-.26l.17-.007.172-.006.086-.003.171-.007A99.788 99.788 0 0 1 7.858 2h.193zM6.4 5.209v4.818l4.157-2.408L6.4 5.209z"/>
11
+ </svg>
12
+ {label}
13
+ </a>
14
+ </p>'''
15
+ elif icon == 'twitter':
16
+ button_code = f'''
17
+ <p>
18
+ <a href={url} class="btn btn-outline-primary btn-lg btn-block" type="button" aria-pressed="true">
19
+ <svg xmlns="http://www.w3.org/2000/svg" width={iconsize} height={iconsize} fill="currentColor" class="bi bi-twitter" viewBox="0 0 16 16">
20
+ <path d="M5.026 15c6.038 0 9.341-5.003 9.341-9.334 0-.14 0-.282-.006-.422A6.685 6.685 0 0 0 16 3.542a6.658 6.658 0 0 1-1.889.518 3.301 3.301 0 0 0 1.447-1.817 6.533 6.533 0 0 1-2.087.793A3.286 3.286 0 0 0 7.875 6.03a9.325 9.325 0 0 1-6.767-3.429 3.289 3.289 0 0 0 1.018 4.382A3.323 3.323 0 0 1 .64 6.575v.045a3.288 3.288 0 0 0 2.632 3.218 3.203 3.203 0 0 1-.865.115 3.23 3.23 0 0 1-.614-.057 3.283 3.283 0 0 0 3.067 2.277A6.588 6.588 0 0 1 .78 13.58a6.32 6.32 0 0 1-.78-.045A9.344 9.344 0 0 0 5.026 15z"/>
21
+ </svg>
22
+ {label}
23
+ </a>
24
+ </p>'''
25
+ elif icon == 'linkedin':
26
+ button_code = f'''
27
+ <p>
28
+ <a href={url} class="btn btn-outline-primary btn-lg btn-block" type="button" aria-pressed="true">
29
+ <svg xmlns="http://www.w3.org/2000/svg" width={iconsize} height={iconsize} fill="currentColor" class="bi bi-linkedin" viewBox="0 0 16 16">
30
+ <path d="M0 1.146C0 .513.526 0 1.175 0h13.65C15.474 0 16 .513 16 1.146v13.708c0 .633-.526 1.146-1.175 1.146H1.175C.526 16 0 15.487 0 14.854V1.146zm4.943 12.248V6.169H2.542v7.225h2.401zm-1.2-8.212c.837 0 1.358-.554 1.358-1.248-.015-.709-.52-1.248-1.342-1.248-.822 0-1.359.54-1.359 1.248 0 .694.521 1.248 1.327 1.248h.016zm4.908 8.212V9.359c0-.216.016-.432.08-.586.173-.431.568-.878 1.232-.878.869 0 1.216.662 1.216 1.634v3.865h2.401V9.25c0-2.22-1.184-3.252-2.764-3.252-1.274 0-1.845.7-2.165 1.193v.025h-.016a5.54 5.54 0 0 1 .016-.025V6.169h-2.4c.03.678 0 7.225 0 7.225h2.4z"/>
31
+ </svg>
32
+ {label}
33
+ </a>
34
+ </p>'''
35
+ elif icon == 'medium':
36
+ button_code = f'''
37
+ <p>
38
+ <a href={url} class="btn btn-outline-primary btn-lg btn-block" type="button" aria-pressed="true">
39
+ <svg xmlns="http://www.w3.org/2000/svg" width={iconsize} height={iconsize} fill="currentColor" class="bi bi-medium" viewBox="0 0 16 16">
40
+ <path d="M9.025 8c0 2.485-2.02 4.5-4.513 4.5A4.506 4.506 0 0 1 0 8c0-2.486 2.02-4.5 4.512-4.5A4.506 4.506 0 0 1 9.025 8zm4.95 0c0 2.34-1.01 4.236-2.256 4.236-1.246 0-2.256-1.897-2.256-4.236 0-2.34 1.01-4.236 2.256-4.236 1.246 0 2.256 1.897 2.256 4.236zM16 8c0 2.096-.355 3.795-.794 3.795-.438 0-.793-1.7-.793-3.795 0-2.096.355-3.795.794-3.795.438 0 .793 1.699.793 3.795z"/>
41
+ </svg>
42
+ {label}
43
+ </a>
44
+ </p>'''
45
+ elif icon == 'newsletter':
46
+ button_code = f'''
47
+ <p>
48
+ <a href={url} class="btn btn-outline-primary btn-lg btn-block" type="button" aria-pressed="true">
49
+ <svg xmlns="http://www.w3.org/2000/svg" width={iconsize} height={iconsize} fill="currentColor" class="bi bi-envelope" viewBox="0 0 16 16">
50
+ <path d="M0 4a2 2 0 0 1 2-2h12a2 2 0 0 1 2 2v8a2 2 0 0 1-2 2H2a2 2 0 0 1-2-2V4Zm2-1a1 1 0 0 0-1 1v.217l7 4.2 7-4.2V4a1 1 0 0 0-1-1H2Zm13 2.383-4.708 2.825L15 11.105V5.383Zm-.034 6.876-5.64-3.471L8 9.583l-1.326-.795-5.64 3.47A1 1 0 0 0 2 13h12a1 1 0 0 0 .966-.741ZM1 11.105l4.708-2.897L1 5.383v5.722Z"/>
51
+ </svg>
52
+ {label}
53
+ </a>
54
+ </p>'''
55
+ elif icon == 'github':
56
+ button_code = f'''
57
+ <p>
58
+ <a href={url} class="btn btn-outline-primary btn-lg btn-block" type="button" aria-pressed="true">
59
+ <svg xmlns="http://www.w3.org/2000/svg" width={iconsize} height={iconsize} fill="currentColor" class="bi bi-github" viewBox="0 0 16 16">
60
+ <path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.012 8.012 0 0 0 16 8c0-4.42-3.58-8-8-8z"/>
61
+ </svg>
62
+ {label}
63
+ </a>
64
+ </p>'''
65
+ elif icon == '':
66
+ button_code = f'''
67
+ <p>
68
+ <a href={url} class="btn btn-outline-primary btn-lg btn-block" type="button" aria-pressed="true">
69
+ {label}
70
+ </a>
71
+ </p>'''
72
+ return st.markdown(button_code, unsafe_allow_html=True)
tools/style.css ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /* Move block container higher */
2
+ div.block-container.css-18e3th9.egzxvld2 {
3
+ margin-top: -5.8em;
4
+ }
5
+
6
+ /* Move menu container higher */
7
+ div.css-1vq4p4l.e1fqkh3o4 {
8
+ padding-top: 3rem;
9
+ }
10
+
11
+ /* Hide anchor link */
12
+ .css-1dgmtll.e16nr0p32 {
13
+ display: none
14
+ }
15
+
16
+ div[data-testid="metric-container"] {
17
+ background-color: #f7f8fa;
18
+ border: 1px solid #0c0d0d;
19
+ padding: 5% 5% 5% 10%;
20
+ border-radius: 5px;
21
+ color: rgb(30, 103, 119);
22
+ overflow-wrap: break-word;
23
+ }
24
+
25
+ /* breakline for metric text */
26
+ div[data-testid="metric-container"] > label[data-testid="stMetricLabel"] > div {
27
+ overflow-wrap: break-word;
28
+ white-space: break-spaces;
29
+ color: black;
30
+ }
31
+
32
+ /* Hide Streamlit bars */
33
+ #MainMenu {
34
+ visibility: hidden;
35
+ }
36
+
37
+ footer {
38
+ visibility: hidden;
39
+ }
40
+
41
+ /*header {*/
42
+ /* visibility: hidden;*/
43
+ /*}*/
44
+
45
+ /*About page styling*/
46
+
47
+ .css-12oz5g7.egzxvld2 {
48
+ padding-top: 0px;
49
+ }
50
+
51
+ .css-1v0mbdj.etr89bj1 {
52
+ display: block;
53
+ margin-left: auto;
54
+ margin-right: auto;
55
+ min-width: 180px;
56
+ max-width: 40%;
57
+ }
tools/utilities.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+
4
+ def load_css():
5
+ with open("tools/style.css") as f:
6
+ st.markdown('<style>{}</style>'.format(f.read()), unsafe_allow_html=True)
7
+ st.markdown(
8
+ '<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">',
9
+ unsafe_allow_html=True)
views/about.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ from PIL import Image
3
+ from tools.st_functions import st_button
4
+
5
+
6
+ class About:
7
+ class Model:
8
+ pageTitle = "About"
9
+
10
+ def view(self, model):
11
+ st.title(model.pageTitle)
12
+
13
+ st.write(
14
+ "[![Star](https://img.shields.io/github/stars/katanaml/sparrow.svg?logo=github&style=social)](https://github.com/katanaml/sparrow)")
15
+
16
+ col1, col2, col3 = st.columns(3)
17
+ col2.image(Image.open('assets/ab.png'))
18
+
19
+ st.markdown("<h1 style='text-align: center; color: black; font-weight: bold;'>Andrej Baranovskij, Founder Katana ML</h1>",
20
+ unsafe_allow_html=True)
21
+
22
+ st.info(
23
+ 'Sparrow is a tool for data extraction from PDFs, images, and other documents. It is a part of Katana ML, '
24
+ 'a platform for data science and machine learning.')
25
+
26
+ icon_size = 20
27
+
28
+ st_button('youtube', 'https://www.youtube.com/@AndrejBaranovskij', 'Andrej Baranovskij YouTube channel', icon_size)
29
+ st_button('github', 'https://github.com/katanaml/sparrow', 'Sparrow GitHub', icon_size)
30
+ st_button('twitter', 'https://twitter.com/andrejusb', 'Follow me on Twitter', icon_size)
31
+ st_button('medium', 'https://andrejusb.medium.com', 'Read my Blogs on Medium', icon_size)
32
+ st_button('linkedin', 'https://www.linkedin.com/in/andrej-baranovskij/', 'Follow me on LinkedIn', icon_size)
33
+ st_button('', 'https://katanaml.io', 'Katana ML', icon_size)
views/dashboard.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import numpy as np
3
+ import pandas as pd
4
+
5
+
6
+ class Dashboard:
7
+ class Model:
8
+ pageTitle = "Dashboard"
9
+
10
+ documentsTitle = "Documents"
11
+ documentsCount = "10.5K"
12
+ documentsDelta = "125"
13
+
14
+ annotationsTitle = "Annotations"
15
+ annotationsCount = "510"
16
+ annotationsDelta = "-2"
17
+
18
+ accuracyTitle = "Accuracy"
19
+ accuracyCount = "87.9%"
20
+ accuracyDelta = "0.1%"
21
+
22
+ trainingTitle = "Training Time"
23
+ trainingCount = "1.5 hrs"
24
+ trainingDelta = "10 mins"
25
+
26
+ processingTitle = "Processing Time"
27
+ processingCount = "3 secs"
28
+ processingDelta = "-0.1 secs"
29
+
30
+ titleDataExtraction = "## Data Extraction"
31
+ titleModelTraining = "## Model Training"
32
+ titleDataAnnotation = "## Data Annotation"
33
+
34
+ def view(self, model):
35
+ st.title(model.pageTitle)
36
+
37
+ with st.container():
38
+ col1, col2, col3, col4, col5 = st.columns(5)
39
+
40
+ with col1:
41
+ st.metric(label=model.documentsTitle, value=model.documentsCount, delta=model.documentsDelta)
42
+
43
+ with col2:
44
+ st.metric(label=model.annotationsTitle, value=model.annotationsCount, delta=model.annotationsDelta)
45
+
46
+ with col3:
47
+ st.metric(label=model.accuracyTitle, value=model.accuracyCount, delta=model.accuracyDelta)
48
+
49
+ with col4:
50
+ st.metric(label=model.trainingTitle, value=model.trainingCount, delta=model.trainingDelta, delta_color="inverse")
51
+
52
+ with col5:
53
+ st.metric(label=model.processingTitle, value=model.processingCount, delta=model.processingDelta, delta_color="inverse")
54
+
55
+ st.markdown("---")
56
+
57
+
58
+ with st.container():
59
+ st.write(model.titleDataExtraction)
60
+ chart_data = pd.DataFrame(
61
+ np.random.randn(20, 3),
62
+ columns=['a', 'b', 'c'])
63
+
64
+ st.line_chart(chart_data)
65
+
66
+ st.markdown("---")
67
+
68
+ with st.container():
69
+ col1, col2 = st.columns(2)
70
+
71
+ with col1:
72
+ with st.container():
73
+ st.write(model.titleModelTraining)
74
+
75
+ # You can call any Streamlit command, including custom components:
76
+ st.bar_chart(np.random.randn(50, 3))
77
+
78
+ with col2:
79
+ with st.container():
80
+ st.write(model.titleDataAnnotation)
81
+
82
+ chart_data = pd.DataFrame(
83
+ np.random.randn(20, 3),
84
+ columns=['a', 'b', 'c'])
85
+
86
+ st.area_chart(chart_data)
views/data_annotation.py ADDED
@@ -0,0 +1,341 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ from PIL import Image
3
+ import streamlit_nested_layout
4
+ from streamlit_sparrow_labeling import st_sparrow_labeling
5
+ from streamlit_sparrow_labeling import DataProcessor
6
+ import json
7
+ import math
8
+ import os
9
+
10
+
11
+ class DataAnnotation:
12
+ class Model:
13
+ pageTitle = "Data Annotation"
14
+
15
+ img_file = None
16
+ rects_file = None
17
+
18
+ assign_labels_text = "Assign Labels"
19
+ text_caption_1 = "Check 'Assign Labels' to enable editing of labels and values, move and resize the boxes to annotate the document."
20
+ text_caption_2 = "Add annotations by clicking and dragging on the document, when 'Assign Labels' is unchecked."
21
+
22
+ labels = ["", "item", "item_price", "subtotal", "tax", "total", "date_issued", "due_date", "invoice_number",
23
+ "amount_due", "deposit_due"]
24
+
25
+ selected_field = "Selected Field: "
26
+ save_text = "Save"
27
+ saved_text = "Saved!"
28
+
29
+ subheader_1 = "Select"
30
+ subheader_2 = "Upload"
31
+ annotation_text = "Annotation"
32
+ no_annotation_file = "No annotation file selected"
33
+ no_annotation_mapping = "Please annotate the document. Uncheck 'Assign Labels' and draw new annotations"
34
+
35
+ download_text = "Download"
36
+ download_hint = "Download the annotated structure in JSON format"
37
+
38
+ annotation_selection_help = "Select an annotation file to load"
39
+ upload_help = "Upload a file to annotate"
40
+ upload_button_text = "Upload"
41
+ upload_button_text_desc = "Choose a file"
42
+
43
+ assign_labels_text = "Assign Labels"
44
+ assign_labels_help = "Check to enable editing of labels and values"
45
+ save_help = "Save the annotations"
46
+
47
+ error_text = "Value is too long. Please shorten it."
48
+
49
+ def view(self, model, ui_width, device_type, device_width):
50
+ with st.sidebar:
51
+ st.markdown("---")
52
+ st.subheader(model.subheader_1)
53
+
54
+ placeholder_upload = st.empty()
55
+
56
+ file_names = self.get_existing_file_names('docs/image/')
57
+
58
+ if 'annotation_index' not in st.session_state:
59
+ st.session_state['annotation_index'] = 0
60
+ annotation_index = 0
61
+ else:
62
+ annotation_index = st.session_state['annotation_index']
63
+
64
+ annotation_selection = placeholder_upload.selectbox(model.annotation_text, file_names,
65
+ index=annotation_index,
66
+ help=model.annotation_selection_help)
67
+ annotation_index = self.get_annotation_index(annotation_selection, file_names)
68
+ st.session_state['annotation_index'] = annotation_index
69
+
70
+ file_extension = self.get_file_extension(annotation_selection, 'docs/image/')
71
+ model.img_file = f"docs/image/{annotation_selection}" + file_extension
72
+ model.rects_file = f"docs/json/{annotation_selection}.json"
73
+
74
+ st.subheader(model.subheader_2)
75
+
76
+ with st.form("upload-form", clear_on_submit=True):
77
+ uploaded_file = st.file_uploader(model.upload_button_text_desc, accept_multiple_files=False,
78
+ type=['png', 'jpg', 'jpeg'],
79
+ help=model.upload_help)
80
+ submitted = st.form_submit_button(model.upload_button_text)
81
+
82
+ if submitted and uploaded_file is not None:
83
+ ret = self.upload_file(uploaded_file)
84
+
85
+ if ret is not False:
86
+ file_names = self.get_existing_file_names('docs/image/')
87
+
88
+ annotation_index = self.get_annotation_index(annotation_selection, file_names)
89
+ annotation_selection = placeholder_upload.selectbox(model.annotation_text, file_names,
90
+ index=annotation_index,
91
+ help=model.annotation_selection_help)
92
+ st.session_state['annotation_index'] = annotation_index
93
+
94
+ st.title(model.pageTitle + " - " + annotation_selection)
95
+
96
+ if model.img_file is None:
97
+ st.caption(model.no_annotation_file)
98
+ return
99
+
100
+ saved_state = self.fetch_annotations(model.rects_file)
101
+
102
+ assign_labels = st.checkbox(model.assign_labels_text, True, help=model.assign_labels_help)
103
+ mode = "transform" if assign_labels else "rect"
104
+
105
+ docImg = Image.open(model.img_file)
106
+
107
+ data_processor = DataProcessor()
108
+
109
+ with st.container():
110
+ doc_height = saved_state['meta']['image_size']['height']
111
+ doc_width = saved_state['meta']['image_size']['width']
112
+ canvas_width, number_of_columns = self.canvas_available_width(ui_width, doc_width, device_type,
113
+ device_width)
114
+
115
+ if number_of_columns > 1:
116
+ col1, col2 = st.columns([number_of_columns, 10 - number_of_columns])
117
+ with col1:
118
+ result_rects = self.render_doc(model, docImg, saved_state, mode, canvas_width, doc_height, doc_width)
119
+ with col2:
120
+ self.render_form(model, result_rects, data_processor, number_of_columns, annotation_selection)
121
+ else:
122
+ result_rects = self.render_doc(model, docImg, saved_state, mode, canvas_width, doc_height, doc_width)
123
+ self.render_form(model, result_rects, data_processor, number_of_columns, annotation_selection)
124
+
125
+ def render_doc(self, model, docImg, saved_state, mode, canvas_width, doc_height, doc_width):
126
+ with st.container():
127
+ height = 1296
128
+ width = 864
129
+
130
+ result_rects = st_sparrow_labeling(
131
+ fill_color="rgba(0, 151, 255, 0.3)",
132
+ stroke_width=2,
133
+ stroke_color="rgba(0, 50, 255, 0.7)",
134
+ background_image=docImg,
135
+ initial_rects=saved_state,
136
+ height=height,
137
+ width=width,
138
+ drawing_mode=mode,
139
+ display_toolbar=True,
140
+ update_streamlit=True,
141
+ canvas_width=canvas_width,
142
+ doc_height=doc_height,
143
+ doc_width=doc_width,
144
+ image_rescale=True,
145
+ key="doc_annotation" + model.img_file
146
+ )
147
+
148
+ st.caption(model.text_caption_1)
149
+ st.caption(model.text_caption_2)
150
+
151
+ return result_rects
152
+
153
+ def render_form(self, model, result_rects, data_processor, number_of_columns, annotation_selection):
154
+ with st.container():
155
+ if result_rects is not None:
156
+ if len(result_rects.rects_data['words']) == 0:
157
+ st.caption(model.no_annotation_mapping)
158
+ return
159
+ else:
160
+ with open(model.rects_file, 'rb') as file:
161
+ st.download_button(label=model.download_text,
162
+ data=file,
163
+ file_name=annotation_selection + ".json",
164
+ mime='application/json',
165
+ help=model.download_hint)
166
+
167
+ with st.form(key="fields_form"):
168
+ if result_rects.current_rect_index is not None and result_rects.current_rect_index != -1:
169
+ st.write(model.selected_field,
170
+ result_rects.rects_data['words'][result_rects.current_rect_index]['value'])
171
+ st.markdown("---")
172
+
173
+ if number_of_columns == 4:
174
+ self.render_form_wide(result_rects.rects_data['words'], model.labels, result_rects,
175
+ data_processor)
176
+ elif number_of_columns == 5:
177
+ self.render_form_avg(result_rects.rects_data['words'], model.labels, result_rects,
178
+ data_processor)
179
+ elif number_of_columns == 6:
180
+ self.render_form_narrow(result_rects.rects_data['words'], model.labels, result_rects,
181
+ data_processor)
182
+ else:
183
+ self.render_form_mobile(result_rects.rects_data['words'], model.labels, result_rects,
184
+ data_processor)
185
+
186
+ submit = st.form_submit_button(model.save_text, type="primary", help=model.save_help)
187
+ if submit:
188
+
189
+ for word in result_rects.rects_data['words']:
190
+ if len(word['value']) > 100:
191
+ st.error(model.error_text)
192
+ return
193
+
194
+ with open(model.rects_file, "w") as f:
195
+ json.dump(result_rects.rects_data, f, indent=2)
196
+ st.session_state[model.rects_file] = result_rects.rects_data
197
+ # st.write(model.saved_text)
198
+ st.experimental_rerun()
199
+
200
+ def render_form_wide(self, words, labels, result_rects, data_processor):
201
+ col1_form, col2_form, col3_form, col4_form = st.columns([1, 1, 1, 1])
202
+ num_rows = math.ceil(len(words) / 4)
203
+
204
+ for i, rect in enumerate(words):
205
+ if i < num_rows:
206
+ with col1_form:
207
+ self.render_form_element(rect, labels, i, result_rects, data_processor)
208
+ elif i < num_rows * 2:
209
+ with col2_form:
210
+ self.render_form_element(rect, labels, i, result_rects, data_processor)
211
+ elif i < num_rows * 3:
212
+ with col3_form:
213
+ self.render_form_element(rect, labels, i, result_rects, data_processor)
214
+ else:
215
+ with col4_form:
216
+ self.render_form_element(rect, labels, i, result_rects, data_processor)
217
+
218
+ def render_form_avg(self, words, labels, result_rects, data_processor):
219
+ col1_form, col2_form, col3_form = st.columns([1, 1, 1])
220
+ num_rows = math.ceil(len(words) / 3)
221
+
222
+ for i, rect in enumerate(words):
223
+ if i < num_rows:
224
+ with col1_form:
225
+ self.render_form_element(rect, labels, i, result_rects, data_processor)
226
+ elif i < num_rows * 2:
227
+ with col2_form:
228
+ self.render_form_element(rect, labels, i, result_rects, data_processor)
229
+ else:
230
+ with col3_form:
231
+ self.render_form_element(rect, labels, i, result_rects, data_processor)
232
+
233
+ def render_form_narrow(self, words, labels, result_rects, data_processor):
234
+ col1_form, col2_form = st.columns([1, 1])
235
+ num_rows = math.ceil(len(words) / 2)
236
+
237
+ for i, rect in enumerate(words):
238
+ if i < num_rows:
239
+ with col1_form:
240
+ self.render_form_element(rect, labels, i, result_rects, data_processor)
241
+ else:
242
+ with col2_form:
243
+ self.render_form_element(rect, labels, i, result_rects, data_processor)
244
+
245
+ def render_form_mobile(self, words, labels, result_rects, data_processor):
246
+ for i, rect in enumerate(words):
247
+ self.render_form_element(rect, labels, i, result_rects, data_processor)
248
+
249
+ def render_form_element(self, rect, labels, i, result_rects, data_processor):
250
+ default_index = 0
251
+ if rect['label']:
252
+ default_index = labels.index(rect['label'])
253
+
254
+ value = st.text_input("Value", rect['value'], key=f"field_value_{i}",
255
+ disabled=False if i == result_rects.current_rect_index else True)
256
+ label = st.selectbox("Label", labels, key=f"label_{i}", index=default_index,
257
+ disabled=False if i == result_rects.current_rect_index else True)
258
+ st.markdown("---")
259
+
260
+ data_processor.update_rect_data(result_rects.rects_data, i, value, label)
261
+
262
+ def canvas_available_width(self, ui_width, doc_width, device_type, device_width):
263
+ doc_width_pct = (doc_width * 100) / ui_width
264
+ if doc_width_pct < 45:
265
+ canvas_width_pct = 37
266
+ elif doc_width_pct < 55:
267
+ canvas_width_pct = 49
268
+ else:
269
+ canvas_width_pct = 65
270
+
271
+ if ui_width > 700 and canvas_width_pct == 37 and device_type == "desktop":
272
+ return math.floor(canvas_width_pct * ui_width / 100), 4
273
+ elif ui_width > 700 and canvas_width_pct == 49 and device_type == "desktop":
274
+ return math.floor(canvas_width_pct * ui_width / 100), 5
275
+ elif ui_width > 700 and canvas_width_pct == 65 and device_type == "desktop":
276
+ return math.floor(canvas_width_pct * ui_width / 100), 6
277
+ else:
278
+ if device_type == "desktop":
279
+ ui_width = device_width - math.floor((device_width * 22) / 100)
280
+ elif device_type == "mobile":
281
+ ui_width = device_width - math.floor((device_width * 13) / 100)
282
+ return ui_width, 1
283
+
284
+ def fetch_annotations(self, rects_file):
285
+ if rects_file not in st.session_state:
286
+ with open(rects_file, "r") as f:
287
+ saved_state = json.load(f)
288
+ st.session_state[rects_file] = saved_state
289
+ else:
290
+ saved_state = st.session_state[rects_file]
291
+
292
+ return saved_state
293
+
294
+ def upload_file(self, uploaded_file):
295
+ if uploaded_file is not None:
296
+ if os.path.exists(os.path.join("docs/image/", uploaded_file.name)):
297
+ st.write("File already exists")
298
+ return False
299
+
300
+ if len(uploaded_file.name) > 100:
301
+ st.write("File name too long")
302
+ return False
303
+
304
+ with open(os.path.join("docs/image/", uploaded_file.name), "wb") as f:
305
+ f.write(uploaded_file.getbuffer())
306
+
307
+ img_file = Image.open(os.path.join("docs/image/", uploaded_file.name))
308
+
309
+ annotations_json = {
310
+ "meta": {
311
+ "version": "v0.1",
312
+ "split": "train",
313
+ "image_id": len(self.get_existing_file_names("docs/image/")),
314
+ "image_size": {
315
+ "width": img_file.width,
316
+ "height": img_file.height
317
+ }
318
+ },
319
+ "words": []
320
+ }
321
+
322
+ file_name = uploaded_file.name.split(".")[0]
323
+ with open(os.path.join("docs/json/", file_name + ".json"), "w") as f:
324
+ json.dump(annotations_json, f, indent=2)
325
+
326
+ st.write("File uploaded successfully")
327
+
328
+ def get_existing_file_names(self, dir_name):
329
+ # get ordered list of files without file extension, excluding hidden files
330
+ return sorted([os.path.splitext(f)[0] for f in os.listdir(dir_name) if not f.startswith('.')])
331
+
332
+ def get_file_extension(self, file_name, dir_name):
333
+ # get list of files, excluding hidden files
334
+ files = [f for f in os.listdir(dir_name) if not f.startswith('.')]
335
+ for f in files:
336
+ if file_name is not None and os.path.splitext(f)[0] == file_name:
337
+ return os.path.splitext(f)[1]
338
+
339
+ def get_annotation_index(self, file, files_list):
340
+ return files_list.index(file)
341
+
views/data_extraction.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+
4
+ class DataExtraction:
5
+ class Model:
6
+ pageTitle = "Data Extraction"
7
+
8
+ def view(self, model):
9
+ st.title(model.pageTitle)
views/model_training.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+
4
+ class ModelTraining:
5
+ class Model:
6
+ pageTitle = "Model Training"
7
+
8
+ def view(self, model):
9
+ st.title(model.pageTitle)
views/model_tuning.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+
4
+ class ModelTuning:
5
+ class Model:
6
+ pageTitle = "Model Tuning"
7
+
8
+ def view(self, model):
9
+ st.title(model.pageTitle)
views/settings.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+
4
+ class Settings:
5
+ class Model:
6
+ pageTitle = "Settings"
7
+
8
+ def view(self, model):
9
+ st.title(model.pageTitle)