1. Best-in-class English transcriptions!
It can achieve human-level robustness and accuracy on English speech recognition.
Being trained on 680k hours of multilingual data collected from the web, it's robust to accents, background noise, and technical language.
It can achieve human-level robustness and accuracy on English speech recognition.
Being trained on 680k hours of multilingual data collected from the web, it's robust to accents, background noise, and technical language.
3. OpenSource
OpenAI made the audio transcription models and the inference code OpenSource, which will serve as a foundation for building useful applications and further research on robust speech processing.
OpenAI made the audio transcription models and the inference code OpenSource, which will serve as a foundation for building useful applications and further research on robust speech processing.
The model "Whisper" is available in five different variants:
- tiny (39 M)
- base (74 M)
- small (244 M)
- medium (769 M)
- large (1550 M)
Check out the model card here: github.com
- tiny (39 M)
- base (74 M)
- small (244 M)
- medium (769 M)
- large (1550 M)
Check out the model card here: github.com
You can read the research paper on "Robust Speech Recognition via Large-Scale Weak Supervision" to understand how the model works: cdn.openai.com
Finally, combine "Whisper" which can understand any kind of audio like humans with "GPT-3" which can generate human-like text to build innovative products.
To understand the bigger picture, check out my GPT-3 book by @OReillyMedia!
To understand the bigger picture, check out my GPT-3 book by @OReillyMedia!
@OReillyMedia That's a wrap!
Stay tuned for the follow-up content on combining GPT-3 with other ML models to build innovative AI products.
If you liked this thread, consider following me @Saboo_Shubham_ ๐
Stay tuned for the follow-up content on combining GPT-3 with other ML models to build innovative AI products.
If you liked this thread, consider following me @Saboo_Shubham_ ๐
Loading suggestions...