I used OpenAI’s new technology to transcribe audio on my laptop

OpenAI, the company behind image generation and meme generation Program DALL-E and the powerful text autocomplete engine GPT-3, launched a new open-source neural network for transcribing audio to text (via TechCrunch). it’s called a whisper the company says “We are approaching human-level robustness and accuracy in English speech recognition.” It can also automatically recognize, transcribe and translate other languages ​​such as Spanish, Italian and Japanese.

As someone who regularly records and transcribes interviews, I was immediately thrilled by this news. I thought I could write my own app to safely transcribe audio from my computer. Cloud-based services such as and Trint work most of the time and are relatively secure, but there are some interviews where I or my sources do. feel more comfortable If the audio file is not connected to the internet.

Using it was easier than I imagined. My computer is already set up with Python and various developer tools, so installing Whisper was as easy as running a single terminal command. Within 15 minutes, I was using Whisper to transcribe a test audio clip I recorded. For a relatively tech-savvy person who hasn’t yet set up Python, FFmpeg, Xcode, and Homebrew, it will probably take an hour or he closer to two. However, there are already people working to make this process simpler and easier to use, which I’ll get to in a moment.

Command-line apps are obviously not for everyone, but for those doing relatively complex tasks, Whisper is very easy to use.

While OpenAI I definitely saw this use case as a possibilityit’s clear that the company is primarily targeting researchers and developers with this release. Blog post announcing Whisperthe team said its code could “build useful applications and serve as a foundation for further research on robust speech processing,” adding that “Whisper’s high accuracy and ease of use will make it easier for developers to use speech interfaces.” We hope to be able to add it to a wider set of applications.However, the approach is still noteworthy.The company has contributed to some of the most popular machine learning projects such as DALL-E and GPT-3. restricts access to quote desire “We will continue to iterate on the safety system as we learn more about its real-world use.”

Image showing a text file with transcribed lyrics for Yung Gravy's song

The text files Whisper produces aren’t very readable, even if you’re using them to write articles.

There’s also the fact that installing Whisper isn’t necessarily a user-friendly process for most people. However, journalist Peter Sterne is teaming up with her GitHub developer advocate, Christina Warren. try and fix it, announced that it is creating a “free, secure, and easy-to-use transcription app for journalists” based on Whisper’s machine learning models. When I spoke with Stern, he said he decided that this program, called Stage Whisper, should exist, and gave several interviews, explaining that it was “anything I’ve ever done, except for human transcription.” He said that he judged it to be the best transcription that he had used in

I compared the transcripts produced by Whisper to those produced by and Trint for the same files, and they are relatively comparable. There were enough errors in all of them that I would never copy-paste a quote into an article without double-checking the audio (of course this is best practice anyway regardless of what service you’re using is). But the Whisper version definitely works for me. You can search it to find the sections you need and double-check them manually. In theory, Stage Whisper should work exactly the same. This is because the GUI is just wrapped and uses the same model.

Sterne acknowledged that tech from Apple and Google could make Stage Whisper obsolete within a few years — Pixel’s voice recorder app has been capable of offline transcription for years. I was. Roll out to select other Android devicesApple has offline dictation built in iOS (There is currently no good way to actually transcribe an audio file). “But we can’t wait that long,” Stern said. “Today, journalists like us need a good auto-transcription app.” He hopes to have a minimal version of the Whisper-based app ready within two weeks.

To be clear, Whisper won’t make cloud-based services like or Trint completely obsolete. For example, OpenAI’s models lack one of the biggest features of traditional transcription services. It’s the ability to label who said what. Sterne said Stage Whisper probably won’t support this feature. “We don’t develop our own machine learning models.”

The cloud is just someone else’s computer — which means the cloud is significantly faster

While you get the advantages of local processing, there are also disadvantages. The main thing is that your laptop is almost certainly significantly less capable than the computer a professional transcription service uses. , I sent it to Whisper running on his M1 MacBook Pro. It took about 52 minutes to transcribe the entire file. (Yes, I made sure I was using the Apple Silicon version of Python, not Intel’s.) Otter spit out his transcript in less than eight minutes.

However, OpenAI’s technology has one big advantage. It’s the price. Cloud-based subscription services are almost certainly expensive for professional use (Otter has a free tier, Upcoming changes would be less useful for those who transcribe frequently), and transcription capabilities are built into platforms such as microsoft word Alternatively, Pixel requires you to pay for separate software or hardware. Stage Whisper (and Whisper itself) is free and runs on the computer you already have.

Again, OpenAI has higher hopes for Whisper than it does for being the foundation of a secure transcription app. I’m very excited about what the researcher will eventually do with her OpenAI, or learn from watching trained machine learning models. About “680,000 Hours of Multilingual and Multitasking Supervised Data Collected from the Web”. But the fact that it happens to be in actual use today makes it all the more exciting.

Source link

Show More

Leave a Reply

Your email address will not be published.

Back to top button