Spaces:
Running
A newer version of the Streamlit SDK is available:
1.45.0
title: Link Lazarus Method
emoji: π
colorFrom: blue
colorTo: red
sdk: streamlit
sdk_version: 1.32.2
app_file: wikipedia_dead_links_streamlit.py
pinned: false
The Link Lazarus Method: Wikipedia Dead Link Finder by metehan.ai - Streamlit Version
A Streamlit web application for finding and logging dead (broken) external links in Wikipedia articles, identifying potentially available domains for registration, and saving them to a dedicated database.
Features
- Multiple Search Methods:
- Search by text to find Wikipedia articles
- Search by category to find related articles
- Dead Link Detection: Checks external links for HTTP errors or connection issues
- Domain Availability: Identifies which domains from dead links might be available for registration
- Restricted TLD Filtering: Automatically identifies and excludes restricted domains (.edu, .gov, etc.)
- Available Domains Database: Maintains a separate database of potentially available domains
- Real-time Logging: Saves dead links and available domains to JSON files as they're found
- Result Visualization: Displays results in an interactive table with filtering options
- Export to CSV: Download results as a CSV file
- Web Archive Filter: Automatically ignores links from web.archive.org
- Configurable: Adjust settings via the sidebar
Requirements
- Python 3.6+
- Required packages listed in
requirements_streamlit.txt
Installation
pip install -r requirements_streamlit.txt
Usage
Run the Streamlit app:
streamlit run wikipedia_dead_links_streamlit.py
The application will open in your default web browser with three main tabs:
1. Search by Text
- Enter search terms to find Wikipedia articles containing that text
- View search results with snippets
- Process all found pages to check for dead links and available domains
2. Search by Category
- Enter a category name to find Wikipedia categories
- Select a category to crawl its pages
- Find dead links and available domains within those pages
3. Available Domains
- View all potentially available domains found during searches
- Filter domains by status (potentially available, expired, etc.)
- See details about each domain including where it was found
- Download the list as a CSV file
How Domain Availability Works
The app uses these methods to determine if a domain might be available:
- WHOIS Lookup: Checks if the domain has registration information
- Expiration Check: Identifies domains with expired registration dates
- DNS Lookup: Verifies if the domain has active DNS records
- TLD Restriction Check: Identifies restricted TLDs that cannot be freely registered
Domains are flagged as potentially available if:
- No WHOIS registration data is found
- The domain's expiration date has passed
- No DNS records exist for the domain
- The domain does NOT have a restricted TLD (.edu, .gov, .mil, etc.)
Restricted TLDs (Optional)
The following TLDs are recognized as restricted and will never be reported as available, if you choose to filter them:
- .edu - Educational institutions
- .gov - Government entities
- .mil - Military organizations
- .int - International organizations
- Country-specific restrictions like .ac.uk, .gov.uk, etc.
Note: For definitive availability, you should verify with a domain registrar. The tool provides a starting point for identifying potential opportunities.
Configuration Options
- Log file path: Where to save the dead links JSON results
- Available domains file: Where to save the available domains database
- Max concurrent requests: Number of links to check simultaneously
- Max pages to process: Limit the number of articles to process
Output Files
The app generates two main JSON files:
- wikipedia_dead_links.json: Contains details about all dead links found
- available_domains.json: Contains only the potentially available domains and where they were found
You can also download results as CSV files directly from the app. Make sure follow on X @metehan777 and LinkedIn www.linkedin.com/in/metehanyesilyurt for the upcoming updates and more tips&tools.