In Sept 2022 @OpenAI released Whisper, the world's most accurate speech recognition (ASR) that can transcribe and translate speech audio from 97 languages!
But it had a few shortcomings!
Here are the top 7 community enhancements on top of Whisper that you should know about:
๐งต
But it had a few shortcomings!
Here are the top 7 community enhancements on top of Whisper that you should know about:
๐งต
1/7 WhisperX - Word-level time stamps with Whisper
Whisper's transcription accuracy was great but lacked fine-grained word-level timestamps.
This repo combines Whisper with Phoneme-based ASR to deliver word-level timestamps using forced alignment!
github.com
Whisper's transcription accuracy was great but lacked fine-grained word-level timestamps.
This repo combines Whisper with Phoneme-based ASR to deliver word-level timestamps using forced alignment!
github.com
2/7 Buzz - Transcribe and translate audio on your personal computer using OpenAI's Whisper
Supports real-time transcription and translation from your computer's microphones by time chunking and passing it to Whisper!
github.com
Supports real-time transcription and translation from your computer's microphones by time chunking and passing it to Whisper!
github.com
3/7 Fine-Tune Whisper For Multilingual ASR with HuggingFace Transformers
Whisper is great but can you further improve the accuracy by fine-tuning on your custom dataset?
This blog has you covered!
huggingface.co
Whisper is great but can you further improve the accuracy by fine-tuning on your custom dataset?
This blog has you covered!
huggingface.co
4/7 Speaker Diarization - Identify different speakers and their audio segments
Whisper doesn't support speaker identification out of the box.
Can you generate speaker embeddings and use clustering to identify the speaker for each segment?
This ->
huggingface.co
Whisper doesn't support speaker identification out of the box.
Can you generate speaker embeddings and use clustering to identify the speaker for each segment?
This ->
huggingface.co
5/7 Port of OpenAI's Whisper model in C/C++
Want to run lightweight Whisper inference on mobile devices and offline devices?
This C/C++ port of Whisper has got you covered!
github.com
Want to run lightweight Whisper inference on mobile devices and offline devices?
This C/C++ port of Whisper has got you covered!
github.com
6/7 Can you run Whisper large model on low RAM GPUs?
Yes, by loading the encoder on one GPU and the decoder on the other!
#discussioncomment-4156445" target="_blank" rel="noopener" onclick="event.stopPropagation()">github.com
Yes, by loading the encoder on one GPU and the decoder on the other!
#discussioncomment-4156445" target="_blank" rel="noopener" onclick="event.stopPropagation()">github.com
7/7 Audio classification using OpenAI's Whisper
ZAC (Zero-shot Audio Classification using Whisper) allows you to assign audio files to ANY class you want without training.
github.com
ZAC (Zero-shot Audio Classification using Whisper) allows you to assign audio files to ANY class you want without training.
github.com
Loading suggestions...