Commit History

:zap: [Enhance] Loop multiple conditions for extracting abstract
a315628

Hansimov commited on

:boom: [Fix] PermissionError: [Errno 13] Permission denied: /app/files
ea49401

Hansimov commited on

:gem: [Feature] Sync to Hugginface space, and run in docker
d0a4f07

Hansimov commited on

:boom: [Fix] SearchAPIApp: incorrect order of extracted contents to urls
eb0ce75

Hansimov commited on

:pencil: [Config] Generate requirements.txt
d3368e0

Hansimov commited on

:recycle: [Refactor] Replace output_path with html_path to avoid confuse
8c0b736

Hansimov commited on

:gem: [Feature] SearchAPIApp: Concurrent fetch urls and extract contents
f234ce3

Hansimov commited on

:boom: [Fix] WebpageFetcher: raise timeout when request.get hangs
bce51d4

Hansimov commited on

:boom: [Fix] WebpageContentExtractor: UnicodeDecodeError
cff1afc

Hansimov commited on

:recycle: [Refactor] QueryResultsExtractor: prettify logging
0acc824

Hansimov commited on

:zap: [Enhance] BatchWebpageFetcher: return url_and_output_path_list
4591d96

Hansimov commited on

:gem: [Feature] New BatchWebpageContentExtractor: Extract webpage content from multiple html_paths concurrently
1db460d

Hansimov commited on

:gem: [Feature] New BatchWebpageFetcher: Fetch multiple urls concurrently
e92817a

Hansimov commited on

:boom: [Fix] Duplicated query_results in response JSON when passing multiple queries
876e441

Hansimov commited on

:zap: [Enhance] WebpageContentExtractor: Escape dash, and ignore
c7c538d

Hansimov commited on

:zap: [Enhance] ignore classes pattern, especially for 163.com
3dda344

Hansimov commited on

:zap: [Enhance] Rename HTMLFetcher to WebpageFetcher, and add output_parent param
62ee9e4

Hansimov commited on

:zap: [Enhance] SearchAPIApp: overwrite param for query and webpage HTML
9fb4731

Hansimov commited on

:recycle: [Refactor] WebpageContentExtractor: Separate html and markdown processing
a636bcb

Hansimov commited on

:recycle: [Refactor] Move hardcoded consts to network_configs
af2c647

Hansimov commited on

:zap: [Enhance] HTMLFetcher and GoogleSearcher: support cache with overwrite, and ignore host
cf4c3f8

Hansimov commited on

:gem: [Feature] SearchAPIApp: New extract_content param
4d3e890

Hansimov commited on

:gem: [Feature] New WebpageContentExtractor: extract webpage content as clean markdown
e773696

Hansimov commited on

:recycle: [Refactor] HTMLFetcher: replace save_path with output_path
7d44e75

Hansimov commited on

:gem: [Feature] Enable SearchAPIApp: /queries_to_search_results
138c09e

Hansimov commited on

:zap: [Enhance] GoogleSearcher: Add params of result_sum and safe
8bf48d8

Hansimov commited on

:recycle: [Refactor] Rename SearchResultsExtractor to QueryResultsExtractor, and store results
0f6452f

Hansimov commited on

:zap: [Enhance] FilepathConverter: New parent param when init
f9c42cf

Hansimov commited on

:gem: [Feature] New HTMLFetcher: download url to local html file
b259fec

Hansimov commited on

:gem: [Feature] New FilepathConverter: convert urls and queries to valid file path
64a0dbf

Hansimov commited on

:recycle: [Refactor] Move header constructor, and prettier logging
e448a74

Hansimov commited on

:gem: [Feature] SearchResultsExtractor: related questions
f150f6b

Hansimov commited on

:gem: [Feature] New SearchResultsExtractor: title, site, link, abstract
ef3de03

Hansimov commited on

:pencil: [Doc] Readme and git ignore
d6015f4

Hansimov commited on

:gem: [Feature] New Enver and Logger
f2ec1a1

Hansimov commited on

:gem: [Feature] New GoogleSearcher: Enable google search with query
6cf0820

Hansimov commited on