voiacast

Learn

Push-to-talk hotkey, defined

A keyboard combination you hold to record and release to transcribe — the input shape that makes a dictation tool feel like a part of the keyboard.

What it is

A push-to-talk hotkey is a keyboard combination you hold down to record and release to stop. The dictation tool starts capturing audio when the key goes down and submits the captured audio for transcription when the key goes up. The transcribed text lands in the field your cursor is in.

Push-to-talk is one of two common shapes. The other is the toggle: one press to start, another press to stop. Both ship in modern dictation tools; the two feel very different in daily use.

Why push-to-talk wins for short bursts

A push-to-talk hotkey is the right shape when you dictate often and in short chunks — a sentence into a Slack reply, a paragraph into a commit message, a thought into a design doc. The reasons:

The session length is your finger. You hold the key for as long as the thought lasts and release when you are done. There is no need to remember whether dictation is on, and there is no “stuck listening” state.

Punctuation lands on release. The transcription model gets a complete utterance, with a natural pause at the start and end. Punctuation and capitalisation are usually more accurate on complete utterances than on streaming audio that the tool has to chop into chunks of its own.

There is no accidental-keyboard problem. A toggle that is “on” while you type other things can mistake unrelated speech (a phone call across the room) for input. Push-to-talk only records while your finger is down.

Where toggle still has a place

For long-form dictation — a half-hour into a draft document, an extended call transcript — holding a key for thirty minutes is uncomfortable. A toggle wins for sessions measured in minutes rather than seconds.

Voiacast’s design assumes the short-burst case is the dominant one. The default is push-to-talk. A toggle-style mode is a reasonable future addition for long-form sessions and is not in the v1 surface.

Choosing a hotkey

Three rules earn most of the benefit.

Pick a key combination your editor and terminal do not already use. Hold-Option-Space is the default; it avoids the most common conflicts. If a particular editor binds it (Cursor’s command palette is a frequent example), bind another two-key combination — hold-Control-Space or hold-Option-Backtick are common alternatives.

Use a modifier-plus-key combination, not a single key. A single key dedicated to dictation is convenient until you fat-finger it while typing. A modifier means accidental triggers are rare.

Pick a key your thumb or pinky can reach without leaving home row. The hotkey is going to fire dozens of times a day. The ergonomics matter more than the cleverness of the combination.

What “push-to-talk” implies under the hood

The tool installs a global hotkey listener — on macOS, a low-level event tap or an NSEvent monitor — so the key registers regardless of which app has focus. While the key is held, audio capture runs. On release, the captured audio is passed to the transcription model. The text is then typed into whichever text field has keyboard focus, usually via the Accessibility API with synthetic key events as a fallback.

The reason the result feels native is that the typing step is genuinely native: the focused field never knew there was a dictation tool in the loop. As far as the editor is concerned, a person typed the words.

See also

Last reviewed .