Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for quantized Whisper model and update audio transc… #1508

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

EzraEllette
Copy link
Contributor

…ription workflow

  • Integrate whisper-rs library for improved audio transcription
  • Add WhisperLargeV3TurboQuantized transcription engine
  • Modify STT processing to use whisper-rs context and state
  • Update Cargo.toml to include whisper-rs with GPU support
  • Refactor transcription methods to work with new whisper-rs workflow
  • Add download function for quantized Whisper model
  • Update CLI and core audio transcription engine to support new quantized model

name: pull request
about: submit changes to the project
title: "[pr] "
labels: ''
assignees: ''


description

brief description of the changes in this pr.

related issue: #587

how to test

add a few steps to test the pr in the most time efficient way.

  1. run accuracy example
  2. run screenpipe with -a whisper-large-v3-turbo-quantized

if relevant add screenshots or screen captures to prove that this PR works to save us time (check Cap).

if you are not the author of this PR and you see it and you think it can take more than 30 mins for maintainers to review, we will tip you between $20 and $200 for you to review and test it for us.

…ription workflow

- Integrate whisper-rs library for improved audio transcription
- Add WhisperLargeV3TurboQuantized transcription engine
- Modify STT processing to use whisper-rs context and state
- Update Cargo.toml to include whisper-rs with GPU support
- Refactor transcription methods to work with new whisper-rs workflow
- Add download function for quantized Whisper model
- Update CLI and core audio transcription engine to support new quantized model
Copy link

vercel bot commented Feb 28, 2025

@EzraEllette is attempting to deploy a commit to the louis030195's projects Team on Vercel.

A member of the Team first needs to authorize it.

// Enable translation.
params.set_translate(true);
// Set the language to translate to to English.
params.set_language(Some("en"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi language?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to remove this line. below it is the language setting. this was quick and dirty

whisper_model.pcm_to_mel(audio, 2)?;
let (_, lang_tokens) = whisper_model.lang_detect(0, 4)?;
let lang_token = get_lang_token(lang_tokens, languages)?;
params.set_language(get_lang_str(lang_token));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh

@louis030195
Copy link
Collaborator

cool

@AntonIXO
Copy link

AntonIXO commented Feb 28, 2025

whisper-rs should support Vulkan hardware acceleration? Would you include Vulkan, hipblas and other features?
Waiting for merge!

@EzraEllette
Copy link
Contributor Author

whisper-rs should support Vulkan hardware acceleration? Would you include Vulkan, hipblas and other features?

Waiting for merge!

Yes, will you collaborate with me to test on Linux?

@AntonIXO
Copy link

AntonIXO commented Feb 28, 2025

whisper-rs should support Vulkan hardware acceleration? Would you include Vulkan, hipblas and other features?
Waiting for merge!

Yes, will you collaborate with me to test on Linux?

Of course!
What Linux are you running? Do you use X.org or Wayland, and do all screenpipe features work fine for you?
I am on Arch KDE Wayland and I could use only screen record(no app recognition) with this patch: #1496

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants