YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Project Setup & Usage Guide
Overview
This project consists of two main components:
Web Scraper
Built using Python, Selenium, and BeautifulSoup to scrape IBM SkillsBuild course data.Search Engine
Uses Elasticsearch to index and search the scraped course data.
Requirements
Make sure the following are installed before running the project:
- Python 3.8+
- Selenium
- BeautifulSoup4
- Pandas
- Elasticsearch
- tqdm
- json (included with Python)
Project Structure
Only the following folders are required for the project to function correctly:
- Single Threaded Scraper
- Search Engine
All other folders are supporting material and are not required to run the system.
Running the Web Scraper
- Navigate to the Single Threaded Scraper folder.
- Run the scraper:
python integrated_scraper.py
This will scrape course data and prepare it for indexing.
Running the Search Engine
- Start Elasticsearch.
- Navigate to the Search Engine folder.
- Run the indexing script:
python index_courses.py
- Run the search script:
python search_courses.py
You can modify the query inside the script.
Search results will be exported as CSV and JSON files.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support