> ## Documentation Index
> Fetch the complete documentation index at: https://docs.hiroshios.xyz/llms.txt
> Use this file to discover all available pages before exploring further.

# Web fetch

# Web Scraper Node

Hiroshi OS features a dynamic web scraper tool (`web_scrape`) enabling agents to retrieve, clean, and convert raw HTML bodies from arbitrary target URLs into structured, readable markdown blocks.

## Execution tag format

Agents call the scraper via self-closing or explicit tag layouts:

```xml theme={null}
<web_scrape url="https://example.com" />
```

Or:

```xml theme={null}
<web_scrape>https://example.com</web_scrape>
```

## Internal Workflow

1. **Authentication Check:** The tool checks for Firecrawl (`firecrawl_api_key`) and Exa (`exa_api_key`) API tokens inside the global configurations.
2. **Third-Party Routing:** If api keys are configured, it routes the payload to Firecrawl's `/v1/scrape` or Exa's `/contents` endpoints.
3. **Resilient Local Fallback:** If API keys are unconfigured, a fallback downloader retrieves raw HTML, strips `script` and `style` blocks, formats header elements (`h1`-`h6`), structures bold/strong tags (`**`), and compresses paragraph blocks into unified, clean markdown.

## Configurations Schema

Configurations are stored in `AppConfig` under `scraper`:

```yaml theme={null}
scraper:
  enabled: true
  firecrawl_api_key: "fc-..."
  exa_api_key: "exa-..."
```

* **Latency Footprint:** Fallback HTML parser completes string sanitization in **\< 3ms**.
* **Memory Compression:** Stripping formatting blocks compresses document footprint by **\~65-80%**.
