Web Scraper Node
Hiroshi OS features a dynamic web scraper tool (web_scrape) enabling agents to retrieve, clean, and convert raw HTML bodies from arbitrary target URLs into structured, readable markdown blocks.
Execution tag format
Agents call the scraper via self-closing or explicit tag layouts:Internal Workflow
- Authentication Check: The tool checks for Firecrawl (
firecrawl_api_key) and Exa (exa_api_key) API tokens inside the global configurations. - Third-Party Routing: If api keys are configured, it routes the payload to Firecrawl’s
/v1/scrapeor Exa’s/contentsendpoints. - Resilient Local Fallback: If API keys are unconfigured, a fallback downloader retrieves raw HTML, strips
scriptandstyleblocks, formats header elements (h1-h6), structures bold/strong tags (**), and compresses paragraph blocks into unified, clean markdown.
Configurations Schema
Configurations are stored inAppConfig under scraper:
- Latency Footprint: Fallback HTML parser completes string sanitization in < 3ms.
- Memory Compression: Stripping formatting blocks compresses document footprint by ~65-80%.