Spaces:
Runtime error
Runtime error
Commit
·
66a005f
1
Parent(s):
4ceaee9
Fix Convert2Markdown
Browse files- BDP LEC REPORT 3.md +33 -0
- BDP LEC REPORT.md +21 -0
- src/__pycache__/app.cpython-38.pyc +0 -0
- src/app.py +3 -4
BDP LEC REPORT 3.md
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
BDP LEC REPORT 3 Presentation
|
3 |
+
=============================
|
4 |
+
|
5 |
+
---
|
6 |
+
|
7 |
+
# Group Members:
|
8 |
+
|
9 |
+
Stella Shania Mintara, David Samuel and Egivenia.
|
10 |
+
|
11 |
+
---
|
12 |
+
|
13 |
+
# I. Visualization Layer
|
14 |
+
|
15 |
+
## Web Framework
|
16 |
+
|
17 |
+
We use Django to develop a web application to display analysis results.
|
18 |
+
|
19 |
+
Seaborn is an open-source Python library based on matplotlib. It is used for exploratory data analysis and data visualization.
|
20 |
+
|
21 |
+
Django is a popular Python web framework to develop web applications. The model serves as a definition for stored data and manages database interactions.
|
22 |
+
|
23 |
+
Django is a web framework for big data analytics applications.
|
24 |
+
|
25 |
+
Django is a great match with MongoDB for building powerful, secure, easy-to- maintain applications. Support for a non-relational database like MongoDB can be implemented by installing additional Django-MongoDB engines for MongoDB.
|
26 |
+
|
27 |
+
## Serving Database
|
28 |
+
|
29 |
+
A MongoDB database is also used to store inferences after the deep learning models are applied to the data streams as well. For saving large images such as image and video files up to 16MB per file, we can use MongoDB specification which is GridFS.
|
30 |
+
|
31 |
+
## Interactive Querying
|
32 |
+
|
33 |
+
The MongoDB Connector for Spark provides integration between MongoDB and Apache Spark. Spark also includes a cost-based that is an optimization technique in Spark that uses table statistics to determine the most efficient query execution plan.
|
BDP LEC REPORT.md
ADDED
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
BDP LEC REPORT Presentation
|
3 |
+
===========================
|
4 |
+
|
5 |
+
---
|
6 |
+
|
7 |
+
# Group Members:
|
8 |
+
|
9 |
+
Stella Shania Mintara, David Samuel and Egivenia.
|
10 |
+
|
11 |
+
---
|
12 |
+
|
13 |
+
# Case Problem
|
14 |
+
|
15 |
+
FreshMart is already well-established, they have enough resources to buy and own servers. They prefer to outsource the server management to another party so they don’t need to search and hire talents to run and manage the servers.
|
16 |
+
|
17 |
+
---
|
18 |
+
|
19 |
+
# I. Data Source
|
20 |
+
|
21 |
+
The data source is obtained from Fresh Mart’s surveillance cameras. This data will be ingested in the ingestion layer using Apache Kafka.
|
src/__pycache__/app.cpython-38.pyc
CHANGED
Binary files a/src/__pycache__/app.cpython-38.pyc and b/src/__pycache__/app.cpython-38.pyc differ
|
|
src/app.py
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
from text_extractor import TextExtractor
|
2 |
from tqdm import tqdm
|
3 |
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
|
4 |
from transformers import pipeline
|
@@ -35,7 +35,7 @@ def summarize(slides):
|
|
35 |
return generated_slides
|
36 |
|
37 |
def convert2markdown(generated_slides):
|
38 |
-
mdFile = MdUtils(file_name=
|
39 |
for k, v in generated_slides.items():
|
40 |
mdFile.new_line('---\n')
|
41 |
for section in v:
|
@@ -61,8 +61,7 @@ def inference(document):
|
|
61 |
slides = preprocess.get_slides(texts)
|
62 |
generated_slides = summarize(slides)
|
63 |
markdown_path = convert2markdown(generated_slides)
|
64 |
-
|
65 |
-
# markdown_str = f.read()
|
66 |
return markdown_path
|
67 |
|
68 |
|
|
|
1 |
+
from src.text_extractor import TextExtractor
|
2 |
from tqdm import tqdm
|
3 |
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
|
4 |
from transformers import pipeline
|
|
|
35 |
return generated_slides
|
36 |
|
37 |
def convert2markdown(generated_slides):
|
38 |
+
mdFile = MdUtils(file_name=FILENAME, title=f'{FILENAME} Presentation')
|
39 |
for k, v in generated_slides.items():
|
40 |
mdFile.new_line('---\n')
|
41 |
for section in v:
|
|
|
61 |
slides = preprocess.get_slides(texts)
|
62 |
generated_slides = summarize(slides)
|
63 |
markdown_path = convert2markdown(generated_slides)
|
64 |
+
print(f"Markdown Path: {markdown_path}")
|
|
|
65 |
return markdown_path
|
66 |
|
67 |
|