Audio Ingestion & Transcription Filters

Hiroshi supports automatic inbound voice note transponding using deep acoustic pipelines.

When a user transmits an audio note containing a supported MIME type (audio/ogg, audio/mp3, audio/wav), the ingestion layer:

Captures the binary stream buffer.
Dispatches it to Whisper/Deepgram endpoints.
Automatically replaces the raw audio file attachment with the generated text transcription before forwarding the message block to the prompt assembly engine.

⌘I