browser / README.md
broadfield-dev's picture
Update README.md
9f76443 verified
|
raw
history blame
3.5 kB
---
title: Browser
emoji: 🦀
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
---
# Browser API
This document describes how to use the Browser API to search the web and scrape website content. The API is built with Gradio and Playwright, providing a simple interface for web automation tasks.
## API Endpoint
The primary endpoint for this API is `/api/web_browse`. This is a `POST` endpoint that accepts a JSON payload.
## Authentication
This API is public and does not require authentication.
## Actions
The API can perform two main actions: `Search` and `Scrape URL`.
### Search
The `Search` action allows you to perform a web search using a specified search engine. The API will return the content of the search results page in Markdown format.
### Scrape URL
The `Scrape URL` action allows you to retrieve the content of a specific URL. The API will fetch the page, process the HTML, and return the main content in a clean, readable Markdown format.
## Request Body
The request body must be a JSON object with the following structure:
```json
{
"action": "Search" | "Scrape URL",
"query": "string",
"browser_name": "firefox" | "chromium" | "webkit",
"search_engine_name": "string"
}
```
**Parameters:**
* `action` (string, required): The action to perform. Must be either `"Search"` or `"Scrape URL"`.
* `query` (string, required): The search query or the URL to scrape.
* `browser_name` (string, optional): The browser to use for the operation. Defaults to `"firefox"`.
* Available options: `"firefox"`, `"chromium"`, `"webkit"`.
* `search_engine_name` (string, optional): The search engine to use when the action is `"Search"`. Defaults to `"DuckDuckGo"`.
* A full list of supported search engines can be found in the "Supported Search Engines" section.
## Response Body
The API will return a JSON object with the results of the operation.
**On Success:**
```json
{
"status": "success",
"query": "your_query",
"action": "Search" | "Scrape URL",
"final_url": "https://example.com",
"page_title": "Example Domain",
"http_status": 200,
"proxy_used": "Direct Connection",
"markdown_content": "# Example Domain..."
}
```
**On Error:**
```json
{
"status": "error",
"query": "your_query",
"proxy_used": "Direct Connection",
"error_message": "Navigation Timeout: The page for 'your_query' took too long to load."
}
```
## Examples
Here are some examples of how to use the API with `curl`.
### Example 1: Performing a Search
This example performs a search for "latest AI research" using Google.
```bash
curl -X POST -H "Content-Type: application/json" \
-d '{
"action": "Search",
"query": "latest AI research",
"browser_name": "chromium",
"search_engine_name": "Google"
}' \
https://broadfield-dev-browser.hf.space/api/web_browse
```
### Example 2: Scraping a URL
This example scrapes the content from the Wikipedia page for "Web scraping".
```bash
curl -X POST -H "Content-Type: application/json" \
-d '{
"action": "Scrape URL",
"query": "https://en.wikipedia.org/wiki/Web_scraping",
"browser_name": "firefox"
}' \
https://broadfield-dev-browser.hf.space/api/web_browse
```
## Supported Search Engines
The following search engines are supported when using the `"Search"` action:
* Google
* DuckDuckGo
* Bing
* Brave
* Ecosia
* Yahoo
* Startpage
* Qwant
* Swisscows
* You.com
* SearXNG
* MetaGer
* Yandex
* Baidu
* Perplexity