adding whisper-large-v3 as an available model option#2
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Summary 🚀
What does this PR do? 📝
Add the Large Whisper model as an option for transcription models.
Adding instructions about how to include new available models.
Why is this change needed? 🤔
In addition to the smaller Whisper models, we want access to Whisper Large V3, (the turbo version was already included in the script).
How was this implemented? 🛠️
updated the AVAILABLE_MODELS section of
src/transcribe_audio.pywith the HUgging Face models forwhisper-large-v3How to test or reproduce? 🧪
there are two audio files in
tests/assets/audioyou can use them to tests transcription input and output.Use the following with
whisper-tinyoutput is terrible but it's fast for testing:uv run src/transcribe_audio.py uv run python src/transcribe_audio.py --input-path tests/assets/audio/ --output-path output/ --model whisper-tiny --format csvThen use
whisper-large-v3(warning this may be slow to download the model and to run) *Do not run unless you have ample bandwidth and memory. note the--all-audioensures that the transcriptions are re-run which is important if you previously ran transcription with other models.uv run src/transcribe_audio.py uv run python src/transcribe_audio.py --input-path tests/assets/audio/ --output-path output/ --model whisper-large-v3 --format csv --all-audioScreenshots (if applicable) 📷
Checklist ✅
Reviewer Emoji Legend
:code::smiley::+1::100:...and I want the author to know it! This is a way to highlight positive parts of a code review.
:star: :star: :star:And I am providing reasons why it needs to be addressed as well as suggested improvements.
:star: :star:And I am providing suggestions where it could be improved either in this PR or later.
:star:...and consider this a suggestion, not a requirement.
:question:This should be a fully formed question with sufficient information and context that requires a response.
:memo::pick:This does not require any changes and is often better left unsaid. This may include stylistic, formatting, or organization suggestions and should likely be prevented/enforced by linting if they really matter
:recycle:Should include enough context to be actionable and not be considered a nitpick.