On-device dictation, defined

What it means

On-device dictation is dictation that processes your voice on the machine in your hand. You hold a key, you speak, the audio is captured by the microphone, a transcription model decodes it locally, and the text appears in the focused field. No request leaves the Mac.

The phrase is useful because the alternative is so common it has stopped feeling like a choice. Most modern dictation tools upload audio to a server, run a heavier model there, and send the text back. The experience can be excellent. The privacy footprint is not the same.

Why it matters

Three things change when transcription stays on the machine.

The first is privacy. Audio captured for dictation tends to contain the most personal half of your working day — half-formed thoughts, names of clients, medical terms, login phrases. On-device processing means that audio never sits on a server’s disk, never gets cached for accuracy research, and never crosses a network you do not control.

The second is reliability. A flight, a coffee-shop captive portal, a corporate VPN that drops the connection mid-sentence — every one of these breaks cloud dictation. On-device dictation works in airplane mode.

The third is the cost shape. Cloud dictation has a per-minute cost the provider has to absorb or pass on. On-device dictation has a one-time cost — the CPU cycles your Mac was going to run anyway — and no quota.

How it works on a modern Mac

The Whisper family of speech models — open-source neural networks trained on a large multilingual speech dataset — can run efficiently on the Neural Engine that Apple has put in every Mac since the M1. The smaller variants run in real time on a base-model laptop. The larger ones use more memory and produce better accuracy on noisy or heavily-accented speech.

The trade-off is straightforward. Smaller models are fast and quiet on the fan. Larger models are slower and use more battery, in exchange for two or three percentage points of accuracy on the hard cases — accents the model has heard less, ambient noise, multi-speaker overlap.

Voiacast’s Free tier runs on small models on-device. Pro adds the larger ones; the trade-off becomes a setting the user can flip.

What on-device dictation is not

On-device dictation is not “no network”. The license-key validation, the update check, and the website itself all use the network. The thing that stays on the device is the audio and the transcript.

It is also not the same as offline. Offline implies “works without a network at all”; on-device means “the heavy lifting happens locally regardless of the network state”. A Voiacast user with no internet connection can dictate; the rest of the app (auto-update, license re-validation) will pick up when the connection returns.

Where the trade-offs live

The honest trade-off is between accuracy on the hardest cases and keeping everything local. A frontier-scale model running in a data center will, today, beat the largest on-device model on the toughest ten percent of utterances — strong accents, overlapping speakers, very noisy rooms.

For the typical workflow — a developer at a desk, a writer in a quiet home office — the gap is smaller than the marketing implies, and the local custom dictionary closes most of what remains. Voiacast leans on that observation: ship on-device by default, and let the user opt into a bring-your-own-key cloud path on Pro when they actually need the extra few percent.