link-lazarus / README.md
metehan777's picture
Upload README.md
f9c1262 verified

A newer version of the Streamlit SDK is available: 1.45.0

Upgrade
metadata
title: Link Lazarus Method
emoji: πŸ”—
colorFrom: blue
colorTo: red
sdk: streamlit
sdk_version: 1.32.2
app_file: wikipedia_dead_links_streamlit.py
pinned: false

The Link Lazarus Method: Wikipedia Dead Link Finder by metehan.ai - Streamlit Version

A Streamlit web application for finding and logging dead (broken) external links in Wikipedia articles, identifying potentially available domains for registration, and saving them to a dedicated database.

Features

  • Multiple Search Methods:
    • Search by text to find Wikipedia articles
    • Search by category to find related articles
  • Dead Link Detection: Checks external links for HTTP errors or connection issues
  • Domain Availability: Identifies which domains from dead links might be available for registration
  • Restricted TLD Filtering: Automatically identifies and excludes restricted domains (.edu, .gov, etc.)
  • Available Domains Database: Maintains a separate database of potentially available domains
  • Real-time Logging: Saves dead links and available domains to JSON files as they're found
  • Result Visualization: Displays results in an interactive table with filtering options
  • Export to CSV: Download results as a CSV file
  • Web Archive Filter: Automatically ignores links from web.archive.org
  • Configurable: Adjust settings via the sidebar

Requirements

  • Python 3.6+
  • Required packages listed in requirements_streamlit.txt

Installation

pip install -r requirements_streamlit.txt

Usage

Run the Streamlit app:

streamlit run wikipedia_dead_links_streamlit.py

The application will open in your default web browser with three main tabs:

1. Search by Text

  • Enter search terms to find Wikipedia articles containing that text
  • View search results with snippets
  • Process all found pages to check for dead links and available domains

2. Search by Category

  • Enter a category name to find Wikipedia categories
  • Select a category to crawl its pages
  • Find dead links and available domains within those pages

3. Available Domains

  • View all potentially available domains found during searches
  • Filter domains by status (potentially available, expired, etc.)
  • See details about each domain including where it was found
  • Download the list as a CSV file

How Domain Availability Works

The app uses these methods to determine if a domain might be available:

  1. WHOIS Lookup: Checks if the domain has registration information
  2. Expiration Check: Identifies domains with expired registration dates
  3. DNS Lookup: Verifies if the domain has active DNS records
  4. TLD Restriction Check: Identifies restricted TLDs that cannot be freely registered

Domains are flagged as potentially available if:

  • No WHOIS registration data is found
  • The domain's expiration date has passed
  • No DNS records exist for the domain
  • The domain does NOT have a restricted TLD (.edu, .gov, .mil, etc.)

Restricted TLDs (Optional)

The following TLDs are recognized as restricted and will never be reported as available, if you choose to filter them:

  • .edu - Educational institutions
  • .gov - Government entities
  • .mil - Military organizations
  • .int - International organizations
  • Country-specific restrictions like .ac.uk, .gov.uk, etc.

Note: For definitive availability, you should verify with a domain registrar. The tool provides a starting point for identifying potential opportunities.

Configuration Options

  • Log file path: Where to save the dead links JSON results
  • Available domains file: Where to save the available domains database
  • Max concurrent requests: Number of links to check simultaneously
  • Max pages to process: Limit the number of articles to process

Output Files

The app generates two main JSON files:

  1. wikipedia_dead_links.json: Contains details about all dead links found
  2. available_domains.json: Contains only the potentially available domains and where they were found

You can also download results as CSV files directly from the app. Make sure follow on X @metehan777 and LinkedIn www.linkedin.com/in/metehanyesilyurt for the upcoming updates and more tips&tools.