Introducing Whispering, An Ergonomic Wrapper for OpenAI Whisper
The side project I actually use daily
Last night, I found myself diving into a project at 1 AM, working on a wrapper for the OpenAI Whisperer API. The result? Whispering, an efficient and user-friendly way to record your voice, transcribe it, and copy the text directly to your clipboard for easy pasting in your writing.
Access Whispering at https://whispering.bradenwong.com or view its source code on GitHub here.
What Does Whispering Do?
Whispering allows you to record your voice and instantly copy the transcription’s result straight into your clipboard.
Before developing Whispering, I relied heavily on dictation for writing the articles in this blog. My workflow before this used to be going to the official OpenAI website in playground.openai.com, but this ended up being far more manual than I expected, as I often had to click multiple buttons each time I recorded something. Given the volume at which I was trying to produce articles in this blog, those extra clicks were costing me a lot of time and mental energy throughout.
Overcoming Development Challenges
Developing the project wasn’t straightforward, given that the OpenAI documentation was in Python and cURL rather than Typescript. Fortunately, ChatGPT really made it easier to reverse-engineer those examples into Typescript.
Although the application took a night to finish, I could have completed the application much faster had it not been for the challenges with Tauri. I was hoping to be able to develop a desktop application using Tauri so that it could always listen for keyboard shortcuts on a global level. However, I discovered that Tauri didn’t provide a reliable way to access a machine’s microphone, even though native Navigation API does, so I was running into a ton of errors such as the program attempting to process an empty Blob. Eventually, I realized the microphone worked inconsistently, and I couldn’t replicate it.
Since the microphone problem was exclusive to Tauri, I decided to create a website instead. Tauri allowed for an easy transition with minimal modifications, as it essentially functions as a website + Rust bridge that interacts with your file system. My hope is to eventually use another client, perhaps Electron, to resolve the microphone issue.
My Improved Workflow Using Whispering
Now that it works as a website, I have Whispering installed as a shortcut and have a shortcut in Keysmith to open up the application and start recording with Space in one keystroke:
As I am typing, I’ll press the shortcut to start recording (the equivalent of opening the website and pressing Space), then talk into the microphone, then press Space again to stop, and then ⌘ + V to paste the output into my text.
In the last few days, I’ve found Whispering incredibly useful. It has greatly increased my writing output, especially when crafting writing prompts for ChatGPT.