Why VOMO Chose Deepgram for Speech-to-Text

Why VOMO Chose Deepgram for Speech-to-Text

Why VOMO Chose Deepgram for Speech-to-Text

Why VOMO Chose Deepgram for Speech-to-Text

VOMO chose Deepgram for speech-to-text because it offers an accurate cloud-hosted Whisper model, supports transcription of extended recordings, real-time transcription, speaker identification, and automatic formatting for meeting recordings. Deepgram's Nova-2 model meets VOMO's requirements for accuracy and speed, significantly reducing development efforts.

VOMO chose Deepgram for speech-to-text because it offers an accurate cloud-hosted Whisper model, supports transcription of extended recordings, real-time transcription, speaker identification, and automatic formatting for meeting recordings. Deepgram's Nova-2 model meets VOMO's requirements for accuracy and speed, significantly reducing development efforts.

Mar 16, 2024

deepgram homepage
deepgram homepage
deepgram homepage

When I first had the idea for VOMO, it was inspired by the release of OpenAI's Whisper model, which showed a significant improvement in the accuracy of speech-to-text technology. At the time, I envisioned several key features: precise speech-to-text conversion, real-time transcription, the ability to refine transcribed text using GPT, and the integration of vectorized notes with a question-answering function.

As I began researching various products on the market, including OpenAI's Whisper, Assembly, Google and Microsoft's speech-to-text services, and Deepgram, I discovered that each had its own strengths and weaknesses. Whisper was the most powerful, but it lacked two essential features I needed: real-time speech-to-text and support for audio files larger than 25MB without manual segmentation.

Google and Microsoft's real-time speech-to-text offerings were not accurate enough for our needs. If the transcriptions were not precise, users might not continue using our service.

Initially, I found Assembly's pricing to be too high.

Then I discovered Deepgram, which met many of my requirements. They offered a cloud-hosted Whisper model that could support transcription of extended recordings with the same level of accuracy, and their real-time speech-to-text pricing was acceptable (although I later removed this feature). Additionally, for recording meetings, Deepgram could support automatic speaker identification and formatting. These were all features we needed.

Later, I added a bulk speech-to-text feature, allowing users to select dozens of audio files from Apple's Voice Memos and import them into VOMO for batch transcription.

However, I discovered that using Deepgram's Whisper model had concurrency limitations, so we switched to the Nova-2 model. In my opinion, its transcription accuracy is comparable to Whisper, but with faster processing speeds.

As a result, we continue to use Deepgram's Nova-2 model.

In summary, third-party services like Deepgram can significantly reduce the workload for products like VOMO. Most of the speech-related features we wanted to implement were already available through Deepgram.

Try VOMO Today

Transcribe voice memos into text for free