Skill: Retrieve Site
The Retrieve Site skill fetches the HTML content of a public website and returns a sanitized version that agents can safely parse and reason about. This is commonly used when a user wants to reference an existing website as inspiration, extract content from a page, or analyze the structure of a competitor’s site.
Parameters
Section titled “Parameters”| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The full URL of the website to retrieve, including the protocol (e.g., https://example.com). Must be a publicly accessible HTTP or HTTPS URL. |
Example usage
Section titled “Example usage”A user might trigger this skill by saying:
“Take a look at https://example-bakery.com and use it as inspiration for my site.”
The agent invokes the skill with:
url:https://example-bakery.com
The skill fetches the page, sanitizes the HTML, and returns it to the agent. The agent can then analyze the structure, layout, and content to inform a subsequent Create Site call.
Another common scenario:
“Can you grab the text content from our company’s current homepage at https://acme-corp.com?”
The agent retrieves the page and extracts the relevant text content from the sanitized HTML to present to the user.
Response structure
Section titled “Response structure”The skill returns a JSON object containing the sanitized HTML:
{ "url": "https://example-bakery.com", "html": "<!DOCTYPE html><html><head><title>Example Bakery</title>...</html>", "status_code": 200, "content_type": "text/html"}url— the URL that was fetched (after any redirects).html— the sanitized HTML content of the page.status_code— the HTTP status code returned by the target server.content_type— the Content-Type header from the response.
SSRF protections
Section titled “SSRF protections”This skill enforces strict protections against server-side request forgery (SSRF) attacks. The following restrictions are applied before any request is made:
- Private IP ranges are blocked — requests to
10.x.x.x,172.16.x.x-172.31.x.x,192.168.x.x,127.x.x.x, and169.254.x.xare rejected. - Internal hostnames are blocked — requests to
localhost,0.0.0.0, and any hostname that resolves to a private IP are rejected. - Protocol restriction — only
http://andhttps://protocols are allowed.file://,ftp://,gopher://, and other schemes are rejected. - DNS rebinding protection — the resolved IP address is validated after DNS resolution to prevent DNS rebinding attacks.
- Redirect limits — a maximum of 5 redirects are followed. Each redirect target is re-validated against the same SSRF rules.
HTML sanitization
Section titled “HTML sanitization”The returned HTML is sanitized to remove potentially dangerous elements before being passed to the agent:
- Script removal — all
<script>tags and their contents are stripped. - Event handler removal — inline event handlers (
onclick,onerror, etc.) are removed from all elements. - Iframe removal —
<iframe>and<frame>elements are stripped. - External resource preservation —
<link>,<img>, and other resource references are preserved but not fetched. The agent sees the references but does not load them.
Behavior notes
Section titled “Behavior notes”- Timeout — the skill enforces a 15-second timeout for the HTTP request. Sites that do not respond within this window return a timeout error.
- Size limit — responses larger than 5 MB are truncated. The agent will receive the first 5 MB of HTML content.
- Non-HTML content — if the URL returns a non-HTML Content-Type (e.g., JSON, PDF, image), the skill returns an error indicating that only HTML pages are supported.
- Authentication — the skill does not support authenticated requests. Only publicly accessible pages can be retrieved.
API endpoint
Section titled “API endpoint”POST /api/web/v1.0/retrieve-site
See the Skill Execution API for details on authentication and request format.
Related skills: Create Site | Update Site | List Sites