G.Hemanth Sai
commited on
Commit
•
e7dd348
1
Parent(s):
a93aea7
update readme
Browse files
README.md
CHANGED
@@ -7,4 +7,90 @@ sdk: streamlit
|
|
7 |
sdk_version: "1.10.0"
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
sdk_version: "1.10.0"
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
+
---
|
11 |
+
|
12 |
+
# Internship-IVIS-labs
|
13 |
+
|
14 |
+
- The *Intelligent Question Generator* app is an easy-to-use interface built in Streamlit which uses [KeyBERT](https://github.com/MaartenGr/KeyBERT), [Sense2vec](https://github.com/explosion/sense2vec), [T5](https://huggingface.co/ramsrigouthamg/t5_paraphraser)
|
15 |
+
- It uses a minimal keyword extraction technique that leverages multiple NLP embeddings and relies on [Transformers](https://huggingface.co/transformers/) 🤗 to create keywords/keyphrases that are most similar to a document.
|
16 |
+
- [sense2vec](https://github.com/explosion/sense2vec) (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detailed word vectors.
|
17 |
+
|
18 |
+
## Repository Breakdown
|
19 |
+
### src Directory
|
20 |
+
---
|
21 |
+
- `src/Pipeline/QAhaystack.py`: This file contains the code of question answering using [haystack](https://haystack.deepset.ai/overview/intro).
|
22 |
+
- `src/Pipeline/QuestGen.py`: This file contains the code of question generation.
|
23 |
+
- `src/Pipeline/Reader.py`: This file contains the code of reading the document.
|
24 |
+
- `src/Pipeline/TextSummariztion.py`: This file contains the code of text summarization.
|
25 |
+
- `src/PreviousVersionCode/context.py`: This file contains the finding the context of the paragraph.
|
26 |
+
- `src/PreviousVersionCode/QuestionGenerator.py`: This file contains the code of first attempt of question generation.
|
27 |
+
|
28 |
+
## Installation
|
29 |
+
```shell
|
30 |
+
$ git clone https://github.com/HemanthSai7/Internship-IVIS-labs.git
|
31 |
+
```
|
32 |
+
```shell
|
33 |
+
$ cd Internship-IVIS-labs
|
34 |
+
```
|
35 |
+
```python
|
36 |
+
pip install -r requirements.txt
|
37 |
+
```
|
38 |
+
- For the running the app for the first time locally, you need to uncomment the the lines in `src/Pipeline/QuestGen.py` to download the models to the models directory.
|
39 |
+
|
40 |
+
```python
|
41 |
+
streamlit run app.py
|
42 |
+
```
|
43 |
+
- Once the app is running, you can access it at http://localhost:8501
|
44 |
+
```shell
|
45 |
+
You can now view your Streamlit app in your browser.
|
46 |
+
|
47 |
+
Local URL: http://localhost:8501
|
48 |
+
Network URL: http://192.168.0.103:8501
|
49 |
+
```
|
50 |
+
|
51 |
+
## Tech Stack Used
|
52 |
+
![image](https://img.shields.io/badge/Sense2vec-EF546D?style=for-the-badge&logo=Explosion.ai&logoColor=white)
|
53 |
+
![image](https://img.shields.io/badge/Spacy-09A3D5?style=for-the-badge&logo=spaCy&logoColor=white)
|
54 |
+
![image](https://img.shields.io/badge/Haystack-03AF9D?style=for-the-badge&logo=Haystackh&logoColor=white)
|
55 |
+
![image](https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white)
|
56 |
+
![image](https://img.shields.io/badge/PyTorch-D04139?style=for-the-badge&logo=pytorch&logoColor=white)
|
57 |
+
![image](https://img.shields.io/badge/Numpy-013243?style=for-the-badge&logo=numpy&logoColor=white)
|
58 |
+
![image](https://img.shields.io/badge/Pandas-130654?style=for-the-badge&logo=pandas&logoColor=white)
|
59 |
+
![image](https://img.shields.io/badge/matplotlib-b2feb0?style=for-the-badge&logo=matplotlib&logoColor=white)
|
60 |
+
![image](https://img.shields.io/badge/scikit_learn-F7931E?style=for-the-badge&logo=scikit-learn&logoColor=white)
|
61 |
+
![image](https://img.shields.io/badge/Streamlit-EA6566?style=for-the-badge&logo=streamlit&logoColor=white)
|
62 |
+
|
63 |
+
## Timeline
|
64 |
+
### Week 1-2:
|
65 |
+
#### Tasks
|
66 |
+
- [x] Understanding and brushing up the concepts of NLP.
|
67 |
+
- [x] Extracting images and text from a pdf file and storing it in a texty file.
|
68 |
+
- [x] Exploring various open source tools for generating questions from a given text.
|
69 |
+
- [x] Read papers related to the project (Bert,T5,RoBERTa etc).
|
70 |
+
- [x] Summarizing the extracted text using T5 base pre-trained model from the pdf file.
|
71 |
+
|
72 |
+
### Week 3-4:
|
73 |
+
#### Tasks
|
74 |
+
- [x] Understanding the concept of QA systems.
|
75 |
+
- [x] Created a basic script for generating questions from the text.
|
76 |
+
- [x] Created a basic script for finding the context of the paragraph.
|
77 |
+
|
78 |
+
### Week 5-6:
|
79 |
+
#### Tasks
|
80 |
+
|
81 |
+
- [x] Understanding how Transformers models work for NLP tasks Question answering and generation
|
82 |
+
- [x] Understanding how to use the Haystack library for QA systems.
|
83 |
+
- [x] Understanding how to use the Haystack library for Question generation.
|
84 |
+
- [x] PreProcessed the document for Haystack QA for better results .
|
85 |
+
|
86 |
+
### Week 7-8:
|
87 |
+
#### Tasks
|
88 |
+
- [x] Understanding how to generate questions intelligently.
|
89 |
+
- [x] Explored wordnet to find synonyms
|
90 |
+
- [x] Used BertWSD for disambiguating the sentence provided.
|
91 |
+
- [x] Used KeyBERT for finding the keywords in the document.
|
92 |
+
- [x] Used sense2vec for finding better words with high relatedness for the keywords generated.
|
93 |
+
|
94 |
+
### Week 9-10:
|
95 |
+
#### Tasks
|
96 |
+
- [x] Create a streamlit app to demonstrate the project.
|