A week with voice coding
A field report from the first week of voice coding inside Cursor and the editor surface. What changed, what did not, and what the working setup looks like.
Jamie van der Pijll ·
- developer-workflow
- field-report
The first day went badly. I tried to dictate identifiers. I tried to speak out variable names with explicit camelCase. I tried to say “open paren” and “close paren” as if I were dictating Lisp. By mid-afternoon I had decided voice coding was a fantasy and went back to the keyboard.
The second day went better, because I changed the rule. Voice for the prose around the code, keyboard for the code itself. The rule is simple, the line is clear, and the trade-off between throughput and precision falls out cleanly: voice is faster for prose, keyboard is more precise for code, and there is almost no overlap in the middle.
By the end of the first week the workflow had settled. This is a field report on what it looks like.
The setup
The hardware is unremarkable. A Mac with Apple Silicon, an external microphone on a boom arm, a split keyboard. The microphone is the only piece of equipment I bought specifically for this; the built-in laptop microphone works, but the difference in transcription accuracy between built-in and a desk microphone is noticeable enough that the microphone earns its desk space.
The software is similarly unremarkable. Voiacast is bound to hold-Option-Space. The dictionary has about thirty entries by the end of the week — names of three projects, half a dozen technical terms that mis-transcribed on day one, a few client names, and the canonical spellings of a few frameworks. The Pro tier is on; I use the larger model only in a noisy room and the small one the rest of the time.
Cursor is the editor. The Voiacast hotkey fires regardless of which app is focused, so the same key combination works in the editor, in the terminal pane inside the editor, in the browser inputs I have open in another window, and in Slack. The friction of switching apps is the same as the friction of moving the cursor across windows: I do not think about it.
What moved to voice
Five surfaces took most of the dictation through the week.
The AI prompt inside the editor. Cursor’s chat input is the place where I spend the largest fraction of my prose-shaped keystrokes. The prompts are longer than they would otherwise be — by maybe a factor of two — because typing length stopped being the cost. The longer prompts produce better completions; the longer completions need fewer follow-up prompts; the loop closes faster.
The commit message body. The convention “first line a short imperative, body explains the why” is precisely the kind of prose voice handles. By Friday the git history had bodies on every non-trivial commit. The body did not exist on most commits before.
The PR description. Same shape as the commit body, longer. The PR descriptions now carry the context the diff cannot. The reviewers are happier. I am happier.
The Slack reply. Anything longer than one line moved to voice. The inline GitHub PR comment in the editor moved to voice. The DM reply moved to voice. Slack hits a Slack-specific edge case where the focused input has the cursor in the wrong place after a thread expansion, but the fix is to click into the input first. Easy to adapt to.
The design doc. I drafted one design doc this week — a small one, maybe 1500 words. About 80% of it was dictated. The edit pass was on the keyboard. The keyboard half was faster than dictating-and-editing would have been.
What stayed on the keyboard
Three categories survived intact.
The code itself. Identifiers, brackets, semicolons, type annotations, shell commands, file paths. Every attempt to dictate them was slower and less accurate than typing them. The keyboard is the right tool for the language the computer parses.
Anything I needed to type fast and dirty. Quick exploratory hacks, one-off shell commands, edits to a config file I would discard in an hour. The dictation pipeline has a tiny round-trip latency — sub- second — but the hotkey-talk-release cadence is meaningfully slower than the keyboard for short bursts of code. The keyboard wins.
The git commit subject line. I dictate the body. The subject I type. The subject is a short imperative, has a precise format my team checks, and is shorter to type than to dictate-and-edit.
What surprised me
Three things I did not expect.
The dictionary grew fast. By Friday it had thirty entries; by the following Friday it had not grown much further. The dictionary captures the long tail of names I correct most often; the long tail is long, but the head of the distribution is short. Most of the value landed in the first thirty entries.
I started speaking in paragraphs. After a few days the press-talk-release cadence settled into one paragraph per press. The model handles a single paragraph better than two run-together paragraphs; releasing between them helps. The paragraphs felt naturally complete because the cadence imposed completeness.
My wrists felt different. By the end of the week the post-lunch-into-evening fatigue was meaningfully smaller than I am used to. I do not want to over-claim from a one-week sample, but the direction matches the load-shifting argument: moving prose to voice gives back the wrist budget I would have spent typing it.
The honest limitations
A few things did not work cleanly.
Dictation in a noisy environment is harder. I dictated from a coffee shop on Wednesday afternoon and the accuracy was meaningfully worse than at the desk. The larger model helped, but the noise floor in a public space is a real cost. For shared workspaces, the dictation flow needs a quieter window or a directional microphone.
Speaking out loud in a shared office is awkward. I work from home, so this is not a daily problem for me. For an open-office user, the honest answer is “dictate from a meeting room or a phone booth”. The product is loud by definition; there is no way around it.
Voice does not edit well. I tried, a few times, to dictate edits — “in
the paragraph above, change is to was”. This is the wrong tool for
the job. Edits stay on the keyboard. Voice is for the first draft.
The setup, summarised
Microphone on a boom arm. Hold-Option-Space bound to push-to-talk. Dictionary seeded as I went. Voice for prose, keyboard for code. One paragraph per press. Edit on the keyboard.
That is the working setup. After a week, it has settled into a shape I do not think about anymore. The hotkey is part of the keyboard. The dictionary is part of my Mac. The wrists are part of my Saturday.
If you want to try the same shape, the download page is the obvious next step.