:gem: [Feature] New BatchWebpageContentExtractor: Extract webpage content from multiple html_paths concurrently 1db460d Hansimov commited on Jan 11, 2024
:zap: [Enhance] ignore classes pattern, especially for 163.com 3dda344 Hansimov commited on Jan 10, 2024
:recycle: [Refactor] WebpageContentExtractor: Separate html and markdown processing a636bcb Hansimov commited on Jan 10, 2024
:recycle: [Refactor] Move hardcoded consts to network_configs af2c647 Hansimov commited on Jan 10, 2024
:gem: [Feature] New WebpageContentExtractor: extract webpage content as clean markdown e773696 Hansimov commited on Jan 10, 2024