doing
Voice and screenshot input for AI builders on Mac. On-device transcription with no cloud upload and no account. One-time $49 download.
What is doing?
doing is a Mac application for voice input and transcription that works locally and private-by-design. It listens when you hold a hotkey, transcribes your speech in real time, and pastes the resulting text into the active cursor location—so you can talk instead of typing while using AI tools and any text field.
The core purpose is on-device transcription with no cloud upload and no account. The product also supports attaching screenshots to a recording, and it offers configurable post-processing (“Skills”) before the transcript is pasted.
Key Features
- Hold-hotkey voice transcription: Start listening by holding a hotkey, then speak while text is transcribed in real time; release to paste at your cursor.
- Local & private audio handling: Designed so your voice never leaves your Mac—no cloud transcription, no account, and no audio uploaded.
- Screenshot capture attached to the transcript: While holding the hotkey, drag a rectangle anywhere on screen to capture screenshots that are linked to the same transcription session.
- System-level pasting to the active cursor: Works anywhere you can type (browser, editor, terminal, etc.), pasting into the current cursor location.
- YOLO Mode for rapid handoff to AI prompts: When enabled, doing presses Return after pasting the transcription to run your prompt without extra steps.
- Skills for transcript post-processing: Define actions that process the transcript before it’s pasted (examples shown include formalizing, summarizing, converting to a code prompt, or replacing text with emoji), with “app-aware” behavior based on where you paste.
- Engine options with benchmarks: Ships with an on-device engine (Parakeet) and can use bring-your-own API keys for multiple cloud engines; includes a benchmark tool to test providers on the same audio.
- Audio ducking during recording: Automatically fades music/audio down when recording starts and restores it after you stop.
How to Use doing
- Download and install on Mac (macOS 14+ on Apple Silicon is listed).
- In a text field, hold the configured hotkey (shown as fn Talk) to begin listening.
- Speak while the transcript updates in real time.
- Release the hotkey to paste the transcription at your cursor position.
- Optionally capture screenshots by dragging a rectangle while recording, and/or enable YOLO Mode to have doing press Return after pasting.
- If you want different transcription behavior, configure Skills and (where applicable) select the transcription engine—either the built-in on-device option or cloud engines via your own API key.
Use Cases
- Talk to an AI coding assistant from your editor: Use voice transcription and system-level pasting so the transcript lands directly in the same input box where you’re working, then press Return (with YOLO Mode) to send.
- Prepare structured messages for different apps: Use app-aware Skills to rewrite or format your transcript for contexts like email (formalize) or productivity tools (summarize into bullet points).
- Describe bugs with visual context: While recording your voice, capture one or more screenshots so the visual details are attached to the transcription session.
- Generate code-oriented prompts from spoken intent: Use a code-prompt Skill to convert a spoken description into a technical instruction suitable for a coding assistant.
- Run side-by-side transcription tests: Use the built-in benchmark tool to compare the on-device engine against other available engines using the same audio sample, choosing based on speed/cost tradeoffs.
FAQ
-
Does doing upload my audio to the cloud? The page states doing transcribes locally with no audio uploaded and no cloud transcription.
-
Do I need an account to use doing? No account is required, per the page.
-
What are YOLO Mode and what do they change in the workflow? YOLO Mode pastes the transcription and then automatically presses Return, so the AI prompt can run immediately.
-
Can doing work with screenshots and voice together? Yes. While holding the hotkey, you can drag a rectangle to capture screenshots that are attached to the transcript automatically.
-
Can I choose different transcription engines? The page indicates doing ships with a local engine (Parakeet) and can use bring-your-own API keys for cloud engines; it also includes a benchmark tool to test engines on the same audio.
Alternatives
- On-device voice typing built into macOS (system dictation): Provides speech-to-text for general typing but doesn’t offer the same hotkey-driven transcription-to-cursor workflow, screenshot attachment, or post-processing “Skills” described for doing.
- Cloud transcription services/APIs: Typically require uploading audio and may involve accounts or per-use provider costs; doing’s positioning is local/no-audio-upload and optional bring-your-own-key engines.
- Other AI voice input tools that charge subscriptions: The page compares doing’s one-time $49 pricing against other tools that are described as charging $8–15 per month; alternatives may differ in privacy model (cloud vs local) and recurring cost.
- Browser/editor hotkey voice input extensions: Can reduce typing within specific apps, but doing is presented as system-level so it works wherever you can type (not limited to a single site or editor).
Alternatives
Speech to Text Converter Online
A free online tool that converts audio and video files into accurate text transcripts in over 45 languages. It supports numerous file formats and requires no downloads or sign-ups.
Dictato
Dictato is an offline voice-to-text dictation app for macOS that transcribes on-device and inserts into any app you type in. No cloud.
Memo AI
AI-powered transcription service that converts audio and video files into text.
Sanota
Sanota turns your voice into clear, beautiful text—capture memories and ideas easily, then start for free.
OpenAI Realtime API
Build low-latency, multimodal voice and realtime audio experiences with OpenAI Realtime API—browser voice agents and realtime transcription.
Pewbeam
Pewbeam listens as you preach, detects Bible verses in real time, and displays them instantly on screen—no typing or clicking for pastors.