Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: optimize audio recording in record_and_transcribe #1492

Draft
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

EzraEllette
Copy link
Contributor

#DRAFT

Features

  • audio manager API
    • start audio processing pipeline
    • stop pipeline
    • change settings without restarting the program (WIP)
    • start/stop device recording
    • more
  • Device Manager
    • track all device connections (WIP)
    • manage running states for audio devices

This commit introduces several improvements to the audio transcription and processing modules:

- Restructured whisper transcription code for better modularity
- Extracted language constants to a separate module
- Enhanced language detection and token processing logic
- Simplified audio stream and device handling
- Improved error handling and code readability
- Extracted utility functions for better separation of concerns
This commit introduces significant refactoring of the project's database and type management:

- Created a new `screenpipe-db` module to centralize database-related types
- Removed redundant `db_types.rs` and `db.rs` from `screenpipe-server`
- Added conversion traits between different module types
- Updated imports and references across multiple modules
- Simplified type conversions and reduced code duplication
- Improved overall project structure and modularity
This commit introduces new management modules in the screenpipe-audio crate:
- Added audio_manager module
- Introduced device_manager module
- Created segmentation_manager module
- Updated core module imports and structure

The changes improve the organization and modularity of audio-related functionality.
This commit introduces several improvements to the audio recording and processing workflow:
- Refactored audio segment collection to use a more memory-efficient approach
- Added dynamic buffer management with overlap handling
- Moved segmentation manager to a dedicated module
- Updated embedding extractor to defer session creation
- Improved error handling in audio stream processing

The changes enhance the robustness and memory efficiency of audio recording and segmentation.
This commit introduces several improvements to the speech-to-text and embedding modules:
- Converted STT functions to async to improve concurrency
- Updated embedding extractor to cache ONNX session
- Fixed audio overlap buffer calculation in core module
- Enhanced error handling and async processing in transcription logic
This commit introduces several improvements to audio processing modules:
- Converted `pcm_to_mel` and related functions to async for better concurrency
- Enhanced error handling in segment preparation
- Updated thread-based processing to use tokio async tasks
- Improved error propagation in audio segment processing
This commit introduces several improvements to the AudioManager and server integration:
- Refactored AudioManager to use Arc and improve thread safety
- Added device disabling functionality
- Simplified audio device handling in server startup
- Updated CLI and server initialization to work with new AudioManager
- Removed redundant device control mechanisms
- Improved error handling for audio device management
Copy link

vercel bot commented Feb 26, 2025

@EzraEllette is attempting to deploy a commit to the louis030195's projects Team on Vercel.

A member of the Team first needs to authorize it.

This commit introduces several improvements to audio device handling:
- Updated DeviceManager to dynamically list and start audio devices
- Modified AudioManagerBuilder to automatically select default devices
- Added server endpoint for starting audio devices dynamically
- Improved error handling and device initialization in AudioManager
- Refactored device management to be more flexible and robust
- Updated `stop_device` method in AudioManager to be immutable
- Added new `/audio/device/stop` endpoint for stopping audio recording devices
- Implemented error handling and response formatting for device stop operation
- Added TODO comment for future device start method refactoring
- Implemented `/audio/start` and `/audio/stop` endpoints for global audio processing
- Updated AudioManager to support starting and stopping audio processing
- Added status checks to prevent redundant start/stop operations
- Improved error handling and response formatting for audio processing endpoints
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant