|
```bash |
|
% python ask.py -i local -c -q "How does Ask.py work?" |
|
2024-11-20 10:00:09,335 - INFO - Initializing converter ... |
|
2024-11-20 10:00:09,335 - INFO - β
Successfully initialized Docling. |
|
2024-11-20 10:00:09,335 - INFO - Initializing chunker ... |
|
2024-11-20 10:00:09,550 - INFO - β
Successfully initialized Chonkie. |
|
2024-11-20 10:00:09,850 - INFO - Initializing database ... |
|
2024-11-20 10:00:09,933 - INFO - β
Successfully initialized DuckDB. |
|
2024-11-20 10:00:09,933 - INFO - Processing the local data directory ... |
|
2024-11-20 10:00:09,933 - INFO - Processing README.pdf ... |
|
Fetching 9 files: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9/9 [00:00<00:00, 11781.75it/s] |
|
2024-11-20 10:00:29,629 - INFO - β
Finished processing README.pdf. |
|
2024-11-20 10:00:29,629 - INFO - Chunking the text ... |
|
2024-11-20 10:00:29,639 - INFO - β
Generated 2 chunks ... |
|
2024-11-20 10:00:29,639 - INFO - Saving 2 chunks to DB ... |
|
2024-11-20 10:00:29,681 - INFO - Embedding 1 batches of chunks ... |
|
2024-11-20 10:00:30,337 - INFO - β
Finished embedding. |
|
2024-11-20 10:00:30,423 - INFO - β
Created the vector index ... |
|
2024-11-20 10:00:30,483 - INFO - β
Created the full text search index ... |
|
2024-11-20 10:00:30,483 - INFO - β
Successfully embedded and saved chunks to DB. |
|
2024-11-20 10:00:30,483 - INFO - Querying the vector DB to get context ... |
|
2024-11-20 10:00:30,773 - INFO - Running full-text search ... |
|
2024-11-20 10:00:30,796 - INFO - β
Got 2 matched chunks. |
|
2024-11-20 10:00:30,797 - INFO - Running inference with context ... |
|
2024-11-20 10:00:34,939 - INFO - β
Finished inference API call. |
|
2024-11-20 10:00:34,939 - INFO - Generating output ... |
|
# Answer |
|
|
|
Ask.py is a Python program designed to implement a search-extract-summarize flow, similar to AI search engines like Perplexity. It can be run through a command line interface or a GradIO user interface and allows for flexibility in controlling output and search behaviors[1]. |
|
|
|
When a query is executed, Ask.py performs the following steps: |
|
|
|
1. Searches Google for the top 10 web pages related to the query. |
|
2. Crawls and scrapes the content of these pages. |
|
3. Breaks down the scraped text into chunks and saves them in a vector database. |
|
4. Conducts a vector search with the initial query to identify the top 10 matched text chunks. |
|
5. Optionally integrates full-text search results and uses a reranker to refine the results. |
|
6. Utilizes the selected chunks as context to query a language model (LLM) to generate a comprehensive answer. |
|
7. Outputs the answer along with references to the sources[1]. |
|
|
|
Moreover, the program allows various configurations such as date restrictions, site targeting, output language, and output length. It can also scrape specified URL lists instead of performing a web search, making it highly versatile for search and data extraction tasks[2]. |
|
|
|
|
|
# References |
|
|
|
[1] file:///Users/feng/work/github/ask.py/data/README.pdf |
|
[2] file:///Users/feng/work/github/ask.py/data/README.pdf |
|
``` |
|
|