Spaces:

broadfield-dev
/

browser

Paused

App Files Files Community

browser / README.md

broadfield-dev

Update README.md

9f76443 verified about 2 months ago

preview code

raw

history blame

3.5 kB

	---
	title: Browser
	emoji: 🦀
	colorFrom: purple
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.34.2
	app_file: app.py
	pinned: false
	---

	# Browser API

	This document describes how to use the Browser API to search the web and scrape website content. The API is built with Gradio and Playwright, providing a simple interface for web automation tasks.

	## API Endpoint

	The primary endpoint for this API is `/api/web_browse`. This is a `POST` endpoint that accepts a JSON payload.

	## Authentication

	This API is public and does not require authentication.

	## Actions

	The API can perform two main actions: `Search` and `Scrape URL`.

	### Search

	The `Search` action allows you to perform a web search using a specified search engine. The API will return the content of the search results page in Markdown format.

	### Scrape URL

	The `Scrape URL` action allows you to retrieve the content of a specific URL. The API will fetch the page, process the HTML, and return the main content in a clean, readable Markdown format.

	## Request Body

	The request body must be a JSON object with the following structure:

	```json
	{
	"action": "Search" \| "Scrape URL",
	"query": "string",
	"browser_name": "firefox" \| "chromium" \| "webkit",
	"search_engine_name": "string"
	}
	```

	Parameters:

	* `action` (string, required): The action to perform. Must be either `"Search"` or `"Scrape URL"`.
	* `query` (string, required): The search query or the URL to scrape.
	* `browser_name` (string, optional): The browser to use for the operation. Defaults to `"firefox"`.
	* Available options: `"firefox"`, `"chromium"`, `"webkit"`.
	* `search_engine_name` (string, optional): The search engine to use when the action is `"Search"`. Defaults to `"DuckDuckGo"`.
	* A full list of supported search engines can be found in the "Supported Search Engines" section.

	## Response Body

	The API will return a JSON object with the results of the operation.

	On Success:

	```json
	{
	"status": "success",
	"query": "your_query",
	"action": "Search" \| "Scrape URL",
	"final_url": "https://example.com",
	"page_title": "Example Domain",
	"http_status": 200,
	"proxy_used": "Direct Connection",
	"markdown_content": "# Example Domain..."
	}
	```

	On Error:

	```json
	{
	"status": "error",
	"query": "your_query",
	"proxy_used": "Direct Connection",
	"error_message": "Navigation Timeout: The page for 'your_query' took too long to load."
	}
	```

	## Examples

	Here are some examples of how to use the API with `curl`.

	### Example 1: Performing a Search

	This example performs a search for "latest AI research" using Google.

	```bash
	curl -X POST -H "Content-Type: application/json" \
	-d '{
	"action": "Search",
	"query": "latest AI research",
	"browser_name": "chromium",
	"search_engine_name": "Google"
	}' \
	https://broadfield-dev-browser.hf.space/api/web_browse
	```

	### Example 2: Scraping a URL

	This example scrapes the content from the Wikipedia page for "Web scraping".

	```bash
	curl -X POST -H "Content-Type: application/json" \
	-d '{
	"action": "Scrape URL",
	"query": "https://en.wikipedia.org/wiki/Web_scraping",
	"browser_name": "firefox"
	}' \
	https://broadfield-dev-browser.hf.space/api/web_browse
	```

	## Supported Search Engines

	The following search engines are supported when using the `"Search"` action:

	* Google
	* DuckDuckGo
	* Bing
	* Brave
	* Ecosia
	* Yahoo
	* Startpage
	* Qwant
	* Swisscows
	* You.com
	* SearXNG
	* MetaGer
	* Yandex
	* Baidu
	* Perplexity