BLOG

BLOG

Why VOMO Chose Deepgram for Speech-to-Text

Why VOMO Chose Deepgram for Speech-to-Text

Why VOMO Chose Deepgram for Speech-to-Text

Mar 16, 2024

When I first had the idea for VOMO, it was inspired by the release of OpenAI's Whisper model, which showed a significant improvement in the accuracy of speech-to-text technology. At the time, I envisioned several key features: precise speech-to-text conversion, real-time transcription, the ability to refine transcribed text using GPT, and the integration of vectorized notes with a question-answering function.

As I began researching various products on the market, including OpenAI's Whisper, Assembly, Google and Microsoft's speech-to-text services, and Deepgram, I discovered that each had its own strengths and weaknesses. Whisper was the most powerful, but it lacked two essential features I needed: real-time speech-to-text and support for audio files larger than 25MB without manual segmentation.

Google and Microsoft's real-time speech-to-text offerings were not accurate enough for our needs. If the transcriptions were not precise, users might not continue using our service.

Initially, I found Assembly's pricing to be too high.

Then I discovered Deepgram, which met many of my requirements. They offered a cloud-hosted Whisper model that could support transcription of extended recordings with the same level of accuracy, and their real-time speech-to-text pricing was acceptable (although I later removed this feature). Additionally, for recording meetings, Deepgram could support automatic speaker identification and formatting. These were all features we needed.

Later, I added a bulk speech-to-text feature, allowing users to select dozens of audio files from Apple's Voice Memos and import them into VOMO for batch transcription.

However, I discovered that using Deepgram's Whisper model had concurrency limitations, so we switched to the Nova-2 model. In my opinion, its transcription accuracy is comparable to Whisper, but with faster processing speeds.

As a result, we continue to use Deepgram's Nova-2 model.

In summary, third-party services like Deepgram can significantly reduce the workload for products like VOMO. Most of the speech-related features we wanted to implement were already available through Deepgram.

Ready to Transcribe Your Voice Memos to Text?
Ready to Transcribe Your Voice Memos to Text?

Download VOMO today and start your 7-day free trial

Download VOMO today and start your 7-day free trial