> ## Documentation Index
> Fetch the complete documentation index at: https://docs.hiroshios.xyz/llms.txt
> Use this file to discover all available pages before exploring further.

# Media understanding

# Multi-Channel Visual Media Understanding

Hiroshi OS features a high-performance visual media ingestion pipeline designed to process visual attachments across external chat gateways (Telegram, Discord, Slack, etc.) and map them directly into multimodal token streams.

## Inbound Lifecycle Architecture

```text theme={null}
 [ Chat Message Event ] -> [ Inbound Ingestion Loop ] -> [ Temp Local Storage ] -> [ Multimodal Base64 Block ] -> [ Vision Model Stream ]
```

## Configuration & Parameter Bounds

Configurations are managed inside `AppConfig` under `media`:

```yaml theme={null}
media:
  enabled: true
  max_file_size_bytes: 10485760 # 10MB default threshold
  allowed_mime_types:
    - image/png
    - image/jpeg
    - image/webp
```

* **Max Image Payload Size:** 10MB safety margin cap.
* **Storage Directory:** Visual files are cached under `~/.hiroshi/workspace/media/` and tracked by UUID/Epoch names.
* **Multimodal Conversion footprint:** Base64 transformation arrays consume under **\< 4MB** memory.
* **Ingestion latency:** Disk writes and byte parsing execute in **\< 8ms**.

## Provider Integration

When `media` is enabled and active visual attachments are detected, the last user message turn's content array translates to base64 inline blocks:

### OpenAI Vision Block

```json theme={null}
{
  "role": "user",
  "content": [
    { "type": "text", "text": "Observe the attached snapshot." },
    { "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }
  ]
}
```

### Ollama Multimodal Array

```json theme={null}
{
  "role": "user",
  "content": "Observe the attached snapshot.",
  "images": ["/9j/4AAQSkZJRgABAQEA..."]
}
```
