Skill: Retrieve Site

The Retrieve Site skill fetches the HTML content of a public website and returns a sanitized version that agents can safely parse and reason about. This is commonly used when a user wants to reference an existing website as inspiration, extract content from a page, or analyze the structure of a competitor’s site.

Parameters

Parameter	Type	Required	Description
`url`	`string`	Yes	The full URL of the website to retrieve, including the protocol (e.g., `https://example.com`). Must be a publicly accessible HTTP or HTTPS URL.

Example usage

A user might trigger this skill by saying:

“Take a look at https://example-bakery.com and use it as inspiration for my site.”

The agent invokes the skill with:

url: https://example-bakery.com

The skill fetches the page, sanitizes the HTML, and returns it to the agent. The agent can then analyze the structure, layout, and content to inform a subsequent Create Site call.

Another common scenario:

“Can you grab the text content from our company’s current homepage at https://acme-corp.com?”

The agent retrieves the page and extracts the relevant text content from the sanitized HTML to present to the user.

Response structure

The skill returns a JSON object containing the sanitized HTML:

{
  "url": "https://example-bakery.com",
  "html": "<!DOCTYPE html><html><head><title>Example Bakery</title>...</html>",
  "status_code": 200,
  "content_type": "text/html"
}

url — the URL that was fetched (after any redirects).
html — the sanitized HTML content of the page.
status_code — the HTTP status code returned by the target server.
content_type — the Content-Type header from the response.

SSRF protections

This skill enforces strict protections against server-side request forgery (SSRF) attacks. The following restrictions are applied before any request is made:

Private IP ranges are blocked — requests to 10.x.x.x, 172.16.x.x-172.31.x.x, 192.168.x.x, 127.x.x.x, and 169.254.x.x are rejected.
Internal hostnames are blocked — requests to localhost, 0.0.0.0, and any hostname that resolves to a private IP are rejected.
Protocol restriction — only http:// and https:// protocols are allowed. file://, ftp://, gopher://, and other schemes are rejected.
DNS rebinding protection — the resolved IP address is validated after DNS resolution to prevent DNS rebinding attacks.
Redirect limits — a maximum of 5 redirects are followed. Each redirect target is re-validated against the same SSRF rules.

HTML sanitization

The returned HTML is sanitized to remove potentially dangerous elements before being passed to the agent:

Script removal — all <script> tags and their contents are stripped.
Event handler removal — inline event handlers (onclick, onerror, etc.) are removed from all elements.
Iframe removal — <iframe> and <frame> elements are stripped.
External resource preservation — <link>, <img>, and other resource references are preserved but not fetched. The agent sees the references but does not load them.

Behavior notes

Timeout — the skill enforces a 15-second timeout for the HTTP request. Sites that do not respond within this window return a timeout error.
Size limit — responses larger than 5 MB are truncated. The agent will receive the first 5 MB of HTML content.
Non-HTML content — if the URL returns a non-HTML Content-Type (e.g., JSON, PDF, image), the skill returns an error indicating that only HTML pages are supported.
Authentication — the skill does not support authenticated requests. Only publicly accessible pages can be retrieved.

API endpoint

POST /api/web/v1.0/retrieve-site

See the Skill Execution API for details on authentication and request format.

Related skills: Create Site | Update Site | List Sites