Custom dictation dictionary, defined

What it is

A custom dictation dictionary is a list of “from → to” replacements that runs after the speech model produces a transcript. The model hears “next js” or “kubernetes”; the dictionary turns those into “Next.js” and “kubernetes” the way you actually spell them. The replacement happens before the text is typed into the focused field, so what lands in your editor is the corrected form.

It is the single highest-leverage feature for a developer or a domain-heavy professional. A speech model trained on general internet text does not know that your team’s service is called Hummingbird or that your client is Klüber AG. The dictionary tells it.

Why it works

General speech models are accurate on common words and wobble on proper nouns and jargon. The reason is information-theoretic: the model picks the most probable transcription, and a frequent common phrase is more probable than the right answer for a low-frequency technical term. The dictionary skips the probability argument entirely: when the model produces something close to the phonetic shape of a known entry, the dictionary forces the right spelling.

This is why the dictionary closes more accuracy gap than upgrading to a larger model in many real workflows. Larger models help on accents and on noisy rooms; the dictionary helps on the vocabulary the model has not seen.

What to put in it

Start with the names that mis-transcribe most often. For a developer:

Service names. Hummingbird, Aurora, Penguin — whatever your team has named the systems you talk about every day.
Frameworks and tools. Next.js, Postgres, Kubernetes, ECS, IAM, S3, CDK. Most of these have a canonical spelling the speech model guesses wrong.
Internal acronyms. ARP, RFC, MR (if you use GitLab), PR (if you use GitHub).
Client and partner names. Especially the ones with diacritics or non-English spellings.

For a domain professional — legal, medical, customer support, account management — replace “service” with “client” and “framework” with “product” and the same list applies. Names that mis-transcribe every day are the entries that earn the most leverage.

A useful exercise: review the last week of your dictated text for the five spellings you corrected by hand. Those five entries belong in the dictionary immediately.

Local vs cloud dictionaries

Some dictation tools ship a server-side dictionary attached to the user account. Some ship a local one. The trade-off is straightforward:

A server-side dictionary syncs across machines without effort. It also means the list of words you correct most often — which is a uniquely specific snapshot of what you work on — sits on the tool vendor’s server alongside your usage metadata.

A local dictionary is a file on the Mac. It does not sync without help. It also does not leak. For a developer or a legal-medical professional, the second property matters more than the first; the sync problem is solvable with manual export and import.

Voiacast ships a local dictionary. The Pro tier adds settings export so a team can share a curated dictionary as a file.

How it interacts with the model

The dictionary is a post-processing pass, not a fine-tune. The model itself is unchanged; the dictionary edits the model’s output. That means dictionary changes take effect immediately — there is no training step, no waiting for a new model build, no per-user fine-tune. Add an entry; the next transcription uses it.

The honest limitation: a dictionary only works on words the model got close enough to phonetically. For a fully novel word the model has never heard, no amount of dictionary tuning helps. In practice this is rare — the model has heard most of what you say — and the ones that matter (your service names) are typically close-enough hits.