Davidsamuel101 commited on
Commit
66a005f
1 Parent(s): 4ceaee9

Fix Convert2Markdown

Browse files
BDP LEC REPORT 3.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ BDP LEC REPORT 3 Presentation
3
+ =============================
4
+
5
+ ---
6
+
7
+ # Group Members:
8
+
9
+ Stella Shania Mintara, David Samuel and Egivenia.
10
+
11
+ ---
12
+
13
+ # I. Visualization Layer
14
+
15
+ ## Web Framework
16
+
17
+ We use Django to develop a web application to display analysis results.
18
+
19
+ Seaborn is an open-source Python library based on matplotlib. It is used for exploratory data analysis and data visualization.
20
+
21
+ Django is a popular Python web framework to develop web applications. The model serves as a definition for stored data and manages database interactions.
22
+
23
+ Django is a web framework for big data analytics applications.
24
+
25
+ Django is a great match with MongoDB for building powerful, secure, easy-to- maintain applications. Support for a non-relational database like MongoDB can be implemented by installing additional Django-MongoDB engines for MongoDB.
26
+
27
+ ## Serving Database
28
+
29
+ A MongoDB database is also used to store inferences after the deep learning models are applied to the data streams as well. For saving large images such as image and video files up to 16MB per file, we can use MongoDB specification which is GridFS.
30
+
31
+ ## Interactive Querying
32
+
33
+ The MongoDB Connector for Spark provides integration between MongoDB and Apache Spark. Spark also includes a cost-based that is an optimization technique in Spark that uses table statistics to determine the most efficient query execution plan.
BDP LEC REPORT.md ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ BDP LEC REPORT Presentation
3
+ ===========================
4
+
5
+ ---
6
+
7
+ # Group Members:
8
+
9
+ Stella Shania Mintara, David Samuel and Egivenia.
10
+
11
+ ---
12
+
13
+ # Case Problem
14
+
15
+ FreshMart is already well-established, they have enough resources to buy and own servers. They prefer to outsource the server management to another party so they don’t need to search and hire talents to run and manage the servers.
16
+
17
+ ---
18
+
19
+ # I. Data Source
20
+
21
+ The data source is obtained from Fresh Mart’s surveillance cameras. This data will be ingested in the ingestion layer using Apache Kafka.
src/__pycache__/app.cpython-38.pyc CHANGED
Binary files a/src/__pycache__/app.cpython-38.pyc and b/src/__pycache__/app.cpython-38.pyc differ
 
src/app.py CHANGED
@@ -1,4 +1,4 @@
1
- from text_extractor import TextExtractor
2
  from tqdm import tqdm
3
  from transformers import PegasusForConditionalGeneration, PegasusTokenizer
4
  from transformers import pipeline
@@ -35,7 +35,7 @@ def summarize(slides):
35
  return generated_slides
36
 
37
  def convert2markdown(generated_slides):
38
- mdFile = MdUtils(file_name=f"summary/{FILENAME}", title=f'{FILENAME} Presentation')
39
  for k, v in generated_slides.items():
40
  mdFile.new_line('---\n')
41
  for section in v:
@@ -61,8 +61,7 @@ def inference(document):
61
  slides = preprocess.get_slides(texts)
62
  generated_slides = summarize(slides)
63
  markdown_path = convert2markdown(generated_slides)
64
- # with open(markdown_path, 'rt') as f:
65
- # markdown_str = f.read()
66
  return markdown_path
67
 
68
 
 
1
+ from src.text_extractor import TextExtractor
2
  from tqdm import tqdm
3
  from transformers import PegasusForConditionalGeneration, PegasusTokenizer
4
  from transformers import pipeline
 
35
  return generated_slides
36
 
37
  def convert2markdown(generated_slides):
38
+ mdFile = MdUtils(file_name=FILENAME, title=f'{FILENAME} Presentation')
39
  for k, v in generated_slides.items():
40
  mdFile.new_line('---\n')
41
  for section in v:
 
61
  slides = preprocess.get_slides(texts)
62
  generated_slides = summarize(slides)
63
  markdown_path = convert2markdown(generated_slides)
64
+ print(f"Markdown Path: {markdown_path}")
 
65
  return markdown_path
66
 
67