Whisper: The ChatGPT Of Speech Recognition

Speak at full speed, whisper amidst ambient noise, shout louder than the crowd… Whisper analyzes and transcribes as it should. Once again, OpenAI, the creator of ChatGPT, sets the record straight.

Speech RecognitionSpeech Recognition… This is an area that has benefited from decades of research. Programs like Nuance’s Dragon or Express Scribe pride themselves on doing this with elegance. And you probably use Siri on iPhone, Google Support on AndroidAndroid, or CortanaCortana on Windows.

Smarter than Siri or Google Assistant

Let’s face it: a voice recognition system like SiriSiri is very approximate. The dictated texts are usually full of errors and if you don’t bother to read them again, you run the risk of annoying the person you are talking to.

Once again OpenAI, creator of the famous ChatGPT but also of the IA image generator Dall.e2, stands out for its surprising quality. The example speech in the OpenAI blog speaks for itself – it is spoken at high speed and is really difficult to decipher by ear. Whisper still manages to decode it. And it should be noted that Whisper, which sets it apart from Siri or Google’s analytics tools, incorporates smart punctuation into its transcriptions.

Let’s be clear: this tool produces a better rendering than the one used by YouTubeYouTube to generate video subtitles. In addition, Whisper also provides a timing of what was said, which then only has to be transmitted to YouTube.

Whisper, an automatic speech recognition tool, has been trained on vast amounts of information, similar to ChatGPT. In this case, it’s 680,000 hours of multilingual data found on the internet. Because, an important detail, Whisper is also able to transcribe the spoken sentences in several languages - even if at the moment its maximum efficiency is in English.

Only for geeks

Unfortunately, if you want to try Whisper, you will have to wait unless you have certain technical skills, since its use requires practicing the PythonPython language.

However, there is a relatively accessible solution. Google happens to provide a platform, Google Colab, that makes it easy to use Python commands. If you speak good English, just follow the instructions on this page – you don’t need to program in Python at all, the procedure has been simplified as much as possible. You can analyze an MP3 voice file that you previously placed on your Google Drive.

We tested Whisper in an interview in French with multiple speakers and it worked very well. The result could be used almost as is to produce subtitles on YouTube. So much to say, Whisper shows great promise and could open a page in the history of computer speech recognition!