:gem: [Feature] Sync to Hugginface space, and run in docker d0a4f07 Hansimov commited on Jan 11, 2024
:boom: [Fix] SearchAPIApp: incorrect order of extracted contents to urls eb0ce75 Hansimov commited on Jan 11, 2024
:recycle: [Refactor] Replace output_path with html_path to avoid confuse 8c0b736 Hansimov commited on Jan 11, 2024
:gem: [Feature] SearchAPIApp: Concurrent fetch urls and extract contents f234ce3 Hansimov commited on Jan 11, 2024
:boom: [Fix] WebpageFetcher: raise timeout when request.get hangs bce51d4 Hansimov commited on Jan 11, 2024
:recycle: [Refactor] QueryResultsExtractor: prettify logging 0acc824 Hansimov commited on Jan 11, 2024
:zap: [Enhance] BatchWebpageFetcher: return url_and_output_path_list 4591d96 Hansimov commited on Jan 11, 2024
:gem: [Feature] New BatchWebpageContentExtractor: Extract webpage content from multiple html_paths concurrently 1db460d Hansimov commited on Jan 11, 2024
:gem: [Feature] New BatchWebpageFetcher: Fetch multiple urls concurrently e92817a Hansimov commited on Jan 11, 2024
:boom: [Fix] Duplicated query_results in response JSON when passing multiple queries 876e441 Hansimov commited on Jan 11, 2024
:zap: [Enhance] WebpageContentExtractor: Escape dash, and ignore c7c538d Hansimov commited on Jan 10, 2024
:zap: [Enhance] ignore classes pattern, especially for 163.com 3dda344 Hansimov commited on Jan 10, 2024
:zap: [Enhance] Rename HTMLFetcher to WebpageFetcher, and add output_parent param 62ee9e4 Hansimov commited on Jan 10, 2024
:zap: [Enhance] SearchAPIApp: overwrite param for query and webpage HTML 9fb4731 Hansimov commited on Jan 10, 2024
:recycle: [Refactor] WebpageContentExtractor: Separate html and markdown processing a636bcb Hansimov commited on Jan 10, 2024
:recycle: [Refactor] Move hardcoded consts to network_configs af2c647 Hansimov commited on Jan 10, 2024
:zap: [Enhance] HTMLFetcher and GoogleSearcher: support cache with overwrite, and ignore host cf4c3f8 Hansimov commited on Jan 10, 2024
:gem: [Feature] New WebpageContentExtractor: extract webpage content as clean markdown e773696 Hansimov commited on Jan 10, 2024
:recycle: [Refactor] HTMLFetcher: replace save_path with output_path 7d44e75 Hansimov commited on Jan 10, 2024
:gem: [Feature] Enable SearchAPIApp: /queries_to_search_results 138c09e Hansimov commited on Jan 10, 2024
:zap: [Enhance] GoogleSearcher: Add params of result_sum and safe 8bf48d8 Hansimov commited on Jan 10, 2024
:recycle: [Refactor] Rename SearchResultsExtractor to QueryResultsExtractor, and store results 0f6452f Hansimov commited on Jan 10, 2024
:zap: [Enhance] FilepathConverter: New parent param when init f9c42cf Hansimov commited on Jan 10, 2024
:gem: [Feature] New HTMLFetcher: download url to local html file b259fec Hansimov commited on Jan 7, 2024
:gem: [Feature] New FilepathConverter: convert urls and queries to valid file path 64a0dbf Hansimov commited on Jan 7, 2024
:recycle: [Refactor] Move header constructor, and prettier logging e448a74 Hansimov commited on Jan 6, 2024
:gem: [Feature] New SearchResultsExtractor: title, site, link, abstract ef3de03 Hansimov commited on Jan 6, 2024
:gem: [Feature] New GoogleSearcher: Enable google search with query 6cf0820 Hansimov commited on Jan 6, 2024