Why VOMO Chose Deepgram for Speech-to-Text

Transformez instantanément l'audio en texte

99% Précis - Super rapide - Facile à utiliser

When I first had the idea for VOMO, it was inspired by the release of OpenAI’s Whisper model, which showed a significant improvement in the accuracy of de la parole au texte technology. At the time, I envisioned several key features: precise speech-to-text conversion, real-time transcription, the ability to refine transcribed text using GPT, and the integration of vectorized notes with a question-answering function.

As I began researching various products on the market, including OpenAI’s Whisper, Assembly, Google and Microsoft’s speech-to-text services, and Deepgram, I discovered that each had its own strengths and weaknesses. Whisper was the most powerful, but it lacked two essential features I needed: real-time speech-to-text and support for audio files larger than 25MB without manual segmentation.

Google and Microsoft’s real-time speech-to-text Modèles d'IA were not accurate enough for our needs. If the transcriptions were not precise, users might not continue using our service.

Initially, I found Assembly’s pricing to be too high.

Then I discovered Deepgram, which met many of my requirements. They offered a cloud-hosted Whisper model that could support transcription of extended recordings with the same level of accuracy, and their real-time speech-to-text pricing was acceptable (although I later removed this feature). Additionally, for recording meetings, Deepgram could support automatic speaker identification and formatting. These were all features we needed.

Later, I added a bulk speech-to-text feature, allowing users to select dozens of audio files from Apple’s Mémos vocaux and import them into VOMO for transcription par lots.

However, I discovered that using Deepgram’s Whisper model had concurrency limitations, so we switched to the Nova-2 model. In my opinion, its la précision de la transcription is comparable to Whisper, but with faster processing speeds.

As a result, we continue to use Deepgram’s Nova-2 model.

In summary, third-party services like Deepgram can significantly reduce the workload for products like VOMO. Most of the speech-related features we wanted to implement were already available through Deepgram.

logo vomo
20250727 103817 22
Débloquer les notes de réunion instantanées
épi de blé gauche

La confiance de plus de 100 000 utilisateurs

5 étoiles
épi de blé à droite

Aucune carte de crédit n'est requise