Multi-Channel Visual Media Understanding
Hiroshi OS features a high-performance visual media ingestion pipeline designed to process visual attachments across external chat gateways (Telegram, Discord, Slack, etc.) and map them directly into multimodal token streams.Inbound Lifecycle Architecture
Configuration & Parameter Bounds
Configurations are managed insideAppConfig under media:
- Max Image Payload Size: 10MB safety margin cap.
- Storage Directory: Visual files are cached under
~/.hiroshi/workspace/media/and tracked by UUID/Epoch names. - Multimodal Conversion footprint: Base64 transformation arrays consume under < 4MB memory.
- Ingestion latency: Disk writes and byte parsing execute in < 8ms.
Provider Integration
Whenmedia is enabled and active visual attachments are detected, the last user message turn’s content array translates to base64 inline blocks: