G.Hemanth Sai commited on
Commit
e7dd348
1 Parent(s): a93aea7

update readme

Browse files
Files changed (1) hide show
  1. README.md +87 -1
README.md CHANGED
@@ -7,4 +7,90 @@ sdk: streamlit
7
  sdk_version: "1.10.0"
8
  app_file: app.py
9
  pinned: false
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  sdk_version: "1.10.0"
8
  app_file: app.py
9
  pinned: false
10
+ ---
11
+
12
+ # Internship-IVIS-labs
13
+
14
+ - The *Intelligent Question Generator* app is an easy-to-use interface built in Streamlit which uses [KeyBERT](https://github.com/MaartenGr/KeyBERT), [Sense2vec](https://github.com/explosion/sense2vec), [T5](https://huggingface.co/ramsrigouthamg/t5_paraphraser)
15
+ - It uses a minimal keyword extraction technique that leverages multiple NLP embeddings and relies on [Transformers](https://huggingface.co/transformers/) 🤗 to create keywords/keyphrases that are most similar to a document.
16
+ - [sense2vec](https://github.com/explosion/sense2vec) (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detailed word vectors.
17
+
18
+ ## Repository Breakdown
19
+ ### src Directory
20
+ ---
21
+ - `src/Pipeline/QAhaystack.py`: This file contains the code of question answering using [haystack](https://haystack.deepset.ai/overview/intro).
22
+ - `src/Pipeline/QuestGen.py`: This file contains the code of question generation.
23
+ - `src/Pipeline/Reader.py`: This file contains the code of reading the document.
24
+ - `src/Pipeline/TextSummariztion.py`: This file contains the code of text summarization.
25
+ - `src/PreviousVersionCode/context.py`: This file contains the finding the context of the paragraph.
26
+ - `src/PreviousVersionCode/QuestionGenerator.py`: This file contains the code of first attempt of question generation.
27
+
28
+ ## Installation
29
+ ```shell
30
+ $ git clone https://github.com/HemanthSai7/Internship-IVIS-labs.git
31
+ ```
32
+ ```shell
33
+ $ cd Internship-IVIS-labs
34
+ ```
35
+ ```python
36
+ pip install -r requirements.txt
37
+ ```
38
+ - For the running the app for the first time locally, you need to uncomment the the lines in `src/Pipeline/QuestGen.py` to download the models to the models directory.
39
+
40
+ ```python
41
+ streamlit run app.py
42
+ ```
43
+ - Once the app is running, you can access it at http://localhost:8501
44
+ ```shell
45
+ You can now view your Streamlit app in your browser.
46
+
47
+ Local URL: http://localhost:8501
48
+ Network URL: http://192.168.0.103:8501
49
+ ```
50
+
51
+ ## Tech Stack Used
52
+ ![image](https://img.shields.io/badge/Sense2vec-EF546D?style=for-the-badge&logo=Explosion.ai&logoColor=white)
53
+ ![image](https://img.shields.io/badge/Spacy-09A3D5?style=for-the-badge&logo=spaCy&logoColor=white)
54
+ ![image](https://img.shields.io/badge/Haystack-03AF9D?style=for-the-badge&logo=Haystackh&logoColor=white)
55
+ ![image](https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white)
56
+ ![image](https://img.shields.io/badge/PyTorch-D04139?style=for-the-badge&logo=pytorch&logoColor=white)
57
+ ![image](https://img.shields.io/badge/Numpy-013243?style=for-the-badge&logo=numpy&logoColor=white)
58
+ ![image](https://img.shields.io/badge/Pandas-130654?style=for-the-badge&logo=pandas&logoColor=white)
59
+ ![image](https://img.shields.io/badge/matplotlib-b2feb0?style=for-the-badge&logo=matplotlib&logoColor=white)
60
+ ![image](https://img.shields.io/badge/scikit_learn-F7931E?style=for-the-badge&logo=scikit-learn&logoColor=white)
61
+ ![image](https://img.shields.io/badge/Streamlit-EA6566?style=for-the-badge&logo=streamlit&logoColor=white)
62
+
63
+ ## Timeline
64
+ ### Week 1-2:
65
+ #### Tasks
66
+ - [x] Understanding and brushing up the concepts of NLP.
67
+ - [x] Extracting images and text from a pdf file and storing it in a texty file.
68
+ - [x] Exploring various open source tools for generating questions from a given text.
69
+ - [x] Read papers related to the project (Bert,T5,RoBERTa etc).
70
+ - [x] Summarizing the extracted text using T5 base pre-trained model from the pdf file.
71
+
72
+ ### Week 3-4:
73
+ #### Tasks
74
+ - [x] Understanding the concept of QA systems.
75
+ - [x] Created a basic script for generating questions from the text.
76
+ - [x] Created a basic script for finding the context of the paragraph.
77
+
78
+ ### Week 5-6:
79
+ #### Tasks
80
+
81
+ - [x] Understanding how Transformers models work for NLP tasks Question answering and generation
82
+ - [x] Understanding how to use the Haystack library for QA systems.
83
+ - [x] Understanding how to use the Haystack library for Question generation.
84
+ - [x] PreProcessed the document for Haystack QA for better results .
85
+
86
+ ### Week 7-8:
87
+ #### Tasks
88
+ - [x] Understanding how to generate questions intelligently.
89
+ - [x] Explored wordnet to find synonyms
90
+ - [x] Used BertWSD for disambiguating the sentence provided.
91
+ - [x] Used KeyBERT for finding the keywords in the document.
92
+ - [x] Used sense2vec for finding better words with high relatedness for the keywords generated.
93
+
94
+ ### Week 9-10:
95
+ #### Tasks
96
+ - [x] Create a streamlit app to demonstrate the project.