davila7 commited on
Commit
a31da5f
1 Parent(s): 1c744c7

update requirements.txt

Browse files
Files changed (2) hide show
  1. README 2.md +13 -76
  2. requirements.txt +2 -1
README 2.md CHANGED
@@ -20,86 +20,29 @@ All code was written with the help of <a href="https://codegpt.co">Code GPT</a>
20
  - Embedding texts segments with Langchain and OpenAI (**text-embedding-ada-002**)
21
  - Chat with the file using **streamlit-chat** and LangChain QA with source and (**text-davinci-003**)
22
 
23
- # Example
24
- For this example we are going to use this video from The PyCoach
25
- https://youtu.be/lKO3qDLCAnk
26
-
27
- Add the video URL and then click Start Analysis
28
- ![Youtube](https://user-images.githubusercontent.com/6216945/217701635-7c386ca7-c802-4f56-8148-dcce57555b5a.gif)
29
-
30
- ## Pytube and OpenAI Whisper
31
- The video will be downloaded with pytube and then OpenAI Whisper will take care of transcribing and segmenting the video.
32
- ![Pyyube Whisper](https://user-images.githubusercontent.com/6216945/217704219-886d0afc-4181-4797-8827-82f4fd456f4f.gif)
33
-
34
- ```python
35
- # Get the video
36
- youtube_video = YouTube(youtube_link)
37
- streams = youtube_video.streams.filter(only_audio=True)
38
- mp4_video = stream.download(filename='youtube_video.mp4')
39
- audio_file = open(mp4_video, 'rb')
40
-
41
- # whisper load base model
42
- model = whisper.load_model('base')
43
-
44
- # Whisper transcription
45
- output = model.transcribe("youtube_video.mp4")
46
- ```
47
-
48
- ## Embedding with "text-embedding-ada-002"
49
- We obtain the vectors with **text-embedding-ada-002** of each segment delivered by whisper
50
- ![Embedding](https://user-images.githubusercontent.com/6216945/217705008-180285d7-6bce-40c3-8601-576cc2f38171.gif)
51
-
52
- ```python
53
- # Embeddings
54
- segments = output['segments']
55
- for segment in segments:
56
- openai.api_key = user_secret
57
- response = openai.Embedding.create(
58
- input= segment["text"].strip(),
59
- model="text-embedding-ada-002"
60
- )
61
- embeddings = response['data'][0]['embedding']
62
- meta = {
63
- "text": segment["text"].strip(),
64
- "start": segment['start'],
65
- "end": segment['end'],
66
- "embedding": embeddings
67
- }
68
- data.append(meta)
69
- pd.DataFrame(data).to_csv('word_embeddings.csv')
70
- ```
71
- ## OpenAI GPT-3
72
- We make a question to the vectorized text, we do the search of the context and then we send the prompt with the context to the model "text-davinci-003"
73
-
74
- ![Question1](https://user-images.githubusercontent.com/6216945/217708086-b89dce2e-e3e2-47a7-b7dd-77e402d818cb.gif)
75
-
76
- We can even ask direct questions about what happened in the video. For example, here we ask about how long the exercise with Numpy that Pycoach did in the video took.
77
-
78
- ![Question2](https://user-images.githubusercontent.com/6216945/217708485-df1edef3-d5f1-4b4a-a5c9-d08f31c80be4.gif)
79
-
80
  # Running Locally
81
 
82
  1. Clone the repository
83
 
84
  ```bash
85
- git clone https://github.com/davila7/youtube-gpt
86
- cd youtube-gpt
87
  ```
88
  2. Install dependencies
89
 
90
  These dependencies are required to install with the requirements.txt file:
91
 
92
- * streamlit
93
- * streamlit_chat
94
- * matplotlib
95
- * plotly
96
- * scipy
97
- * sklearn
98
- * pandas
99
- * numpy
100
- * git+https://github.com/openai/whisper.git
101
- * pytube
102
- * openai-whisper
103
 
104
  ```bash
105
  pip install -r requirements.txt
@@ -109,9 +52,3 @@ pip install -r requirements.txt
109
  ```bash
110
  streamlit run app.py
111
  ```
112
-
113
- ## Upcoming Features 🚀
114
-
115
- - Semantic search with embedding
116
- - Chart with emotional analysis
117
- - Connect with Pinecone
 
20
  - Embedding texts segments with Langchain and OpenAI (**text-embedding-ada-002**)
21
  - Chat with the file using **streamlit-chat** and LangChain QA with source and (**text-davinci-003**)
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  # Running Locally
24
 
25
  1. Clone the repository
26
 
27
  ```bash
28
+ git clone https://github.com/davila7/file-gpt
29
+ cd file-gpt
30
  ```
31
  2. Install dependencies
32
 
33
  These dependencies are required to install with the requirements.txt file:
34
 
35
+ * openai
36
+ * pypdf
37
+ * scikit-learn
38
+ * numpy
39
+ * tiktoken
40
+ * docx2txt
41
+ * langchain
42
+ * pydantic
43
+ * typing
44
+ * faiss-cpu
45
+ * streamlit_chat
46
 
47
  ```bash
48
  pip install -r requirements.txt
 
52
  ```bash
53
  streamlit run app.py
54
  ```
 
 
 
 
 
 
requirements.txt CHANGED
@@ -7,4 +7,5 @@ docx2txt
7
  langchain
8
  pydantic
9
  typing
10
- faiss-cpu
 
 
7
  langchain
8
  pydantic
9
  typing
10
+ faiss-cpu
11
+ streamlit_chat