File size: 1,573 Bytes
74a0e88
 
 
 
 
 
cb9cf46
74a0e88
 
 
f9ad220
8d8e1b1
74a0e88
8d8e1b1
 
 
74a0e88
 
8d8e1b1
 
74a0e88
8d8e1b1
 
 
80aa004
8d8e1b1
 
74a0e88
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cb9cf46
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
title: Lex Fridman Podcast Semantic Search
emoji: πŸ’‘
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 4.39.0
app_file: app.py
pinned: false
---
# lex-semantic-search

Gradio application for performing semantic search on Lex Fridman podcast transcripts.

## Dataset

The Gradio application is pre-loaded with chunks (chunk size is 25 contiguous entries) and embeddings for dataset [nmac/lex_fridman_podcast](https://huggingface.co/datasets/nmac/lex_fridman_podcast).

## Usage

1. Set up virtual environment with the required dependencies:
```bash
python -m venv lex-semantic-search
source lex-semantic-search/bin/activate 
pip install -r requirements.txt     # for GPU
pip install -r requirements_cpu.txt # for CPU
```

2. Run the application locally using the following command:
```bash
python app.py
```

3. Access the application by opening your web browser and navigating to http://localhost:7860.

4. In the application interface, adjust the input settings according to your needs:
   - **Query:** Enter a query to search for relevant podcast transcript chunks related to it.
   - **Chunk Size:** Adjust the chunk size. *(Fixed to 25)*
   - **Embeddings Generator:** Select the embeddings generator to use. *(Fixed to `sentence-transformers/multi-qa-mpnet-base-dot-v1`)*
   - **Retriever Method:** Select the retriever method. *(Fixed to `FAISS`)*
   - **Number of Chunks to Retrieve:** Set the number of chunks to retrieve.

5. Click the "Submit" button to retrieve the chunks that match your settings and query. The results will be displayed in a table.