voiacast

Learn

Bring-your-own-key cloud transcription

A cloud transcription mode where the user supplies their own API key for an AI provider, so the request goes from the user's Mac to the provider they pay, not through the dictation tool's servers.

What it is

Bring-your-own-key cloud transcription — usually shortened to BYOK — is a cloud transcription mode where the audio is sent from the user’s Mac directly to an AI provider’s API endpoint, using a key the user owns. The dictation tool is the local glue: it captures the audio, sends it to the endpoint, receives the transcript, and types it into the focused field. The transcript never visits the tool vendor’s infrastructure.

The shape matters because the alternative — a “managed” cloud transcription that routes through the vendor’s servers — couples three things the user might prefer to separate: who pays for the compute, who sees the audio, and who keeps the bill stable.

Why it matters

Three concrete changes from a managed cloud:

Who pays. With BYOK, the user pays the AI provider directly for the audio they send. There is no markup, no per-seat fee on top, no quota on the dictation tool’s side. The tool charges for the local software; the cloud cost is the user’s own bill.

Who sees the audio. The audio travels from the Mac to the AI provider’s endpoint. The dictation tool’s server is not in the path. For a user who is comfortable with — for example — OpenAI or Groq seeing their dictation but not with the dictation tool’s company seeing it, BYOK is the correct shape.

Who manages the bill. BYOK uses the user’s existing AI provider account. Usage limits, billing alerts, organisation-level controls, and key rotation are all the user’s own — the same controls they use for other AI workflows.

What it does not change

BYOK is still cloud. The audio leaves the Mac. A user whose threat model says “audio must never leave the device” is not the audience for BYOK; that user stays on the on-device default and never turns BYOK on. BYOK is a useful third option for users who are happy with cloud transcription specifically through their existing AI provider, not a universal upgrade.

It also does not improve every workflow. For a quiet desk and clear speech, on-device transcription is already accurate enough that the extra round-trip is pure cost. BYOK earns its keep on the cases where the marginal accuracy of a frontier model actually matters:

  • Strong accents the model has heard less of.
  • Heavily noisy environments.
  • Multi-speaker or rapid-speech utterances.
  • Long-form sessions where the compounded accuracy difference adds up.

How it works in Voiacast

The Pro tier surfaces a BYOK toggle in Settings. The user pastes a Groq or OpenAI API key; the key lives in macOS Keychain on the local machine. When BYOK is on, the dictation pipeline routes audio to the provider’s transcription endpoint, receives the text, applies the local custom dictionary, and types into the focused field. When BYOK is off, the same pipeline runs locally.

The key never leaves Keychain. It is read at request time, attached to the outbound HTTPS request, and not stored elsewhere. The user can rotate or revoke the key from their AI provider dashboard at any time; the dictation tool’s behaviour follows.

When to turn it on

A useful heuristic: leave BYOK off for the first month. If you find yourself manually correcting the same kinds of mis-transcriptions — the model is mis-hearing words you can hear yourself say clearly — try BYOK with the larger provider model and see whether the correction rate falls. If it does, leave it on. If it does not, the local model is already close enough to its ceiling for your use, and the cloud round-trip is cost without benefit.

See also

Last reviewed .