Spaces:
Sleeping
Sleeping
```bash | |
% python ask.py -i local -c -q "How does Ask.py work?" | |
2024-11-20 10:00:09,335 - INFO - Initializing converter ... | |
2024-11-20 10:00:09,335 - INFO - β Successfully initialized Docling. | |
2024-11-20 10:00:09,335 - INFO - Initializing chunker ... | |
2024-11-20 10:00:09,550 - INFO - β Successfully initialized Chonkie. | |
2024-11-20 10:00:09,850 - INFO - Initializing database ... | |
2024-11-20 10:00:09,933 - INFO - β Successfully initialized DuckDB. | |
2024-11-20 10:00:09,933 - INFO - Processing the local data directory ... | |
2024-11-20 10:00:09,933 - INFO - Processing README.pdf ... | |
Fetching 9 files: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9/9 [00:00<00:00, 11781.75it/s] | |
2024-11-20 10:00:29,629 - INFO - β Finished processing README.pdf. | |
2024-11-20 10:00:29,629 - INFO - Chunking the text ... | |
2024-11-20 10:00:29,639 - INFO - β Generated 2 chunks ... | |
2024-11-20 10:00:29,639 - INFO - Saving 2 chunks to DB ... | |
2024-11-20 10:00:29,681 - INFO - Embedding 1 batches of chunks ... | |
2024-11-20 10:00:30,337 - INFO - β Finished embedding. | |
2024-11-20 10:00:30,423 - INFO - β Created the vector index ... | |
2024-11-20 10:00:30,483 - INFO - β Created the full text search index ... | |
2024-11-20 10:00:30,483 - INFO - β Successfully embedded and saved chunks to DB. | |
2024-11-20 10:00:30,483 - INFO - Querying the vector DB to get context ... | |
2024-11-20 10:00:30,773 - INFO - Running full-text search ... | |
2024-11-20 10:00:30,796 - INFO - β Got 2 matched chunks. | |
2024-11-20 10:00:30,797 - INFO - Running inference with context ... | |
2024-11-20 10:00:34,939 - INFO - β Finished inference API call. | |
2024-11-20 10:00:34,939 - INFO - Generating output ... | |
# Answer | |
Ask.py is a Python program designed to implement a search-extract-summarize flow, similar to AI search engines like Perplexity. It can be run through a command line interface or a GradIO user interface and allows for flexibility in controlling output and search behaviors[1]. | |
When a query is executed, Ask.py performs the following steps: | |
1. Searches Google for the top 10 web pages related to the query. | |
2. Crawls and scrapes the content of these pages. | |
3. Breaks down the scraped text into chunks and saves them in a vector database. | |
4. Conducts a vector search with the initial query to identify the top 10 matched text chunks. | |
5. Optionally integrates full-text search results and uses a reranker to refine the results. | |
6. Utilizes the selected chunks as context to query a language model (LLM) to generate a comprehensive answer. | |
7. Outputs the answer along with references to the sources[1]. | |
Moreover, the program allows various configurations such as date restrictions, site targeting, output language, and output length. It can also scrape specified URL lists instead of performing a web search, making it highly versatile for search and data extraction tasks[2]. | |
# References | |
[1] file:///Users/feng/work/github/ask.py/data/README.pdf | |
[2] file:///Users/feng/work/github/ask.py/data/README.pdf | |
``` | |