Spaces:
Paused
Paused
| title: Browser | |
| emoji: 🦀 | |
| colorFrom: purple | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.34.2 | |
| app_file: app.py | |
| pinned: false | |
| # Browser API | |
| This document describes how to use the Browser API to search the web and scrape website content. The API is built with Gradio and Playwright, providing a simple interface for web automation tasks. | |
| ## API Endpoint | |
| The primary endpoint for this API is `/api/web_browse`. This is a `POST` endpoint that accepts a JSON payload. | |
| ## Authentication | |
| This API is public and does not require authentication. | |
| ## Actions | |
| The API can perform two main actions: `Search` and `Scrape URL`. | |
| ### Search | |
| The `Search` action allows you to perform a web search using a specified search engine. The API will return the content of the search results page in Markdown format. | |
| ### Scrape URL | |
| The `Scrape URL` action allows you to retrieve the content of a specific URL. The API will fetch the page, process the HTML, and return the main content in a clean, readable Markdown format. | |
| ## Request Body | |
| The request body must be a JSON object with the following structure: | |
| ```json | |
| { | |
| "action": "Search" | "Scrape URL", | |
| "query": "string", | |
| "browser_name": "firefox" | "chromium" | "webkit", | |
| "search_engine_name": "string" | |
| } | |
| ``` | |
| **Parameters:** | |
| * `action` (string, required): The action to perform. Must be either `"Search"` or `"Scrape URL"`. | |
| * `query` (string, required): The search query or the URL to scrape. | |
| * `browser_name` (string, optional): The browser to use for the operation. Defaults to `"firefox"`. | |
| * Available options: `"firefox"`, `"chromium"`, `"webkit"`. | |
| * `search_engine_name` (string, optional): The search engine to use when the action is `"Search"`. Defaults to `"DuckDuckGo"`. | |
| * A full list of supported search engines can be found in the "Supported Search Engines" section. | |
| ## Response Body | |
| The API will return a JSON object with the results of the operation. | |
| **On Success:** | |
| ```json | |
| { | |
| "status": "success", | |
| "query": "your_query", | |
| "action": "Search" | "Scrape URL", | |
| "final_url": "https://example.com", | |
| "page_title": "Example Domain", | |
| "http_status": 200, | |
| "proxy_used": "Direct Connection", | |
| "markdown_content": "# Example Domain..." | |
| } | |
| ``` | |
| **On Error:** | |
| ```json | |
| { | |
| "status": "error", | |
| "query": "your_query", | |
| "proxy_used": "Direct Connection", | |
| "error_message": "Navigation Timeout: The page for 'your_query' took too long to load." | |
| } | |
| ``` | |
| ## Examples | |
| Here are some examples of how to use the API with `curl`. | |
| ### Example 1: Performing a Search | |
| This example performs a search for "latest AI research" using Google. | |
| ```bash | |
| curl -X POST -H "Content-Type: application/json" \ | |
| -d '{ | |
| "action": "Search", | |
| "query": "latest AI research", | |
| "browser_name": "chromium", | |
| "search_engine_name": "Google" | |
| }' \ | |
| https://broadfield-dev-browser.hf.space/api/web_browse | |
| ``` | |
| ### Example 2: Scraping a URL | |
| This example scrapes the content from the Wikipedia page for "Web scraping". | |
| ```bash | |
| curl -X POST -H "Content-Type: application/json" \ | |
| -d '{ | |
| "action": "Scrape URL", | |
| "query": "https://en.wikipedia.org/wiki/Web_scraping", | |
| "browser_name": "firefox" | |
| }' \ | |
| https://broadfield-dev-browser.hf.space/api/web_browse | |
| ``` | |
| ## Supported Search Engines | |
| The following search engines are supported when using the `"Search"` action: | |
| * DuckDuckGo | |
| * Bing | |
| * Brave | |
| * Ecosia | |
| * Yahoo | |
| * Startpage | |
| * Qwant | |
| * Swisscows | |
| * You.com | |
| * SearXNG | |
| * MetaGer | |
| * Yandex | |
| * Baidu | |
| * Perplexity |