Continuous transmission causes delays with unstable connection. #6354

MarioPL98 · 2024-03-03T15:12:22Z

Description

When using continuous transmission with unstable connection, there is up to 5-10 seconds delay accumulating over time. Muting does seem to remove the delay for a short while BUT when using "deafen yourself", the person's icon stays blue for the duration of the delay and only after that goes green (muted). Switching channels when there is a delay also causes the delayed audio to play after channel is switched. Example:

Versions affected:
tested 1.3.x and 1.4.x on Windows with same server versions

Probably related to:
#1617

I've noticed few times that the bug also happens when computer running mumble has high cpu usage or interrupts causing short system freezes. Just like with unstable connection, continuous mode gets delays.

Steps to reproduce

Join server with a good connection.
Make someone join the same server with a bad connection or simulated bad (ping variation 50-500ms, 0-2% packet loss).
Turn on continuous transmission mode.
Wait 10 minutes.

Mumble version

1.4.287

Mumble component

Both

OS

Windows

Reproducible?

Yes, very. Always happens with the same users with weaker connection.

Additional information

Possible solution:
Check amount of audio in buffer / queue and either skip delay or speed up the sound (maybe let user decide in settings).

MarioPL98 · 2024-03-07T17:11:58Z

@Krzmbrzl I can try to look into the issue but can you give me any ideas, where to start?

loliconRoot · 2024-10-22T00:41:15Z

I am also experiencing this problem, all client and server versions are 1.5.634. I am doing audio forwarding via VB-Audio Cable in order to share the audio output on different devices via mumble, which keeps a continuous audio output state on the input. Under certain specific circumstances on the output side (e.g. system lag due to high memory load), the output can have an unusual audio lag in the audio output and cause the timing of the audio streams to be inconsistent on both sides (i.e. the above problem, which can be solved by stopping the output on the input side or the input on the output side, but it really happens on a regular basis). I can make sure that all devices are in a WAN with a latency of no more than 30ms.

jaggzh · 2024-11-03T17:45:08Z

Okay that's really interesting. I've been trying to figure out this long delay, which doesn't occur if I restart mumble. In my case, restarting either client 'resets' it. One system is on a potentially unstable or laggy wifi connection. (I don't know if it started at a certain version -- it's been a year or years that I've had to deal with it.)

I wonder if we can put some code in here somewhere that if the delay gets over such and such ms it restarts ... hmm. Or maybe a periodic [console?] log of the built up buffer? Which side is doing that buffering? I'm working from a local github build.

At present, when I use push-to-talk, I hear my voice echo back from the remote, and this can build to a substantial number of seconds.

jaggzh · 2024-11-20T23:11:01Z

@Hartmnt @mkrautz
You guys have any guidance for this? I am willing to work on a fix/pr for it, but I'm not sure where to start.

Hartmnt · 2024-11-21T17:00:32Z

@Hartmnt @mkrautz You guys have any guidance for this? I am willing to work on a fix/pr for it, but I'm not sure where to start.

AudioOutput.cpp is probably the main file to look at. Specifially the mix method.

qmOutputs contains a map User -> AudioOutputBuffer
AudioOutputBuffer is the parent class of multiple classes such as AudioOutputSpeech which will probably be most interesting to you.

I assume you will have to change some aspects of AudioOutputSpeech. But the most interesting question is: How will you solve the actual issue. Sure you could drop data, if the buffer becomes too large but that will lead to information loss. Speed up the replay of the audio at a certain buffer length? Start skipping every other frame? Try to determine when continuous transmitting users send silence on your client side and skip that?

There are some possibilities which need to be explored. It would be nice to do a short analysis of the possible solutions and debate them. I personally do not have a favorite, yet.

jaggzh · 2024-12-26T13:02:51Z

(I was going through the code and got overwhelmed by the amount of potential areas that might need to be modified to handle it -- things I'm completely unfamiliar with).

"Fixes" that might help other users for now:
1. Voice Activity mode: If you're able to, switching to voice activity mode where if the levels even periodically drop below transmission, can allow the buffer to 'unfill'.
2. Noise reduction: This can reduce the network transmission time, letting your buffer stay unfull [possibly]. It uses CPU though (and it ruins my stuff, see my personal use-case next).

In my situation, I am dealing with my wife, on a ventilator. The sound of the vent and her voice are all necessary to hear, and noise reduction degrades both. So... for my use-case, and many others (maybe they're monitoring a baby breathing), noise reduction is not a good solution. Or maybe they just want continuous full audio.

@Hartmnt Regarding your question on which solutions...
My own project approach might differ from yours and mumble's, but I usually side with providing options. HOWEVER, I do this in a way that I can, hopefully, implement the simplest, and the foundations are there for me or others to add more capabilities if anyone has the know-how (or time) [that possibly I don't].

For instance, currently something is already being done, since the buffer doesn't grow indefinitely -- fixed audio backend buffer it seems. The simplest initial approach, if possible, is to let the user set that, if the selected audio system allows it.

I included that, and a new ability, to do a progressively-increasing frame drop, in my pseudocode for your review:

New UI options:
- Audio Output -> Audio Output:
  - Audio System Buffer Size:
    - Non-editable field showing current audio system’s buffer size (in seconds and in kb?), if available.
  - Output Buffer Size Override [float]:
    - -1 = Use system default
    - Sets the audio system output audio buffer length.
      - (Dev note: For example, it seems like 10s or so right now on my system; dropping that to .5s would likely solve my issues, limiting the amount that can build up before data is dropped (since that must already be happening).)
  - Full-Buffer Handling [pulldown]:
    - {Tooltip: Slow or bad connections can lead to a full output buffer, delayed audio, and lost data.}
    - Options (in order of suspected ease-of-implementation):
      - [ Default ]:
        
        {Tooltip: No special handling. The buffer inherently limits the length, and audio is just lost if it’s over-filled, but audio lag can therefore be built up to the audio system’s buffer size.}
      - [ Frame-drop ]:
        
        {Tooltip: Drops frames to continually reduce audio to aim for minimal lag.}
        
        Settings:
        
        Max tolerable lag time [float]:
        
        {Tooltip: If the buffer has this amount or less, we don’t do anything (if Output Buffer Size Override is set, this value should be less than that for it to have any effect)}
        
        Max drop-frame-rate (frames/s) [int]:
        
        {Tooltip: As buffer goes from Max lag time to full, the dropped frame count will increase.}
        
        See code example below
      - [ Silence-dropping ]:
        
        *(Dev note: This is effectively Voice Activity mode isn't it? So if that's not working for the user, well.. we don't need to implement it here again. :))

In conclusion, if the audio system buffer size can be limited by us, and that's an easy option -- that might be the easiest initial solution for many users. What do you think about that vs. frame-dropping (or both)?

Final notes on frame-dropping being based on frames/sec:
Instead of keeping the time/state, we might be able to offer "Max drop frame fraction" INSTEAD of "Max drop frame rate". This one could let us have a stateless thing that just calculates dropping based on how full the buffer is, but I've not worked out the logic for it yet.

// Frame-drop code example (idea):
	static float last_drop_delta_ms = time_ms(); // or maybe we store this somewhere
	float now_ms = time_ms();
	float delta_ms = now_ms - last_ms;
	last_drop_delta_ms = now_ms;
	float delta_s = delta_ms / 1000;
	float overage = cur_buf_used_time - max_lag_time;
	float ratio = overage / (buf_size - max_lag_time);
	ratio = std::max(0.0f, std::min(ratio, 1.0f));
	float drop_amt = ratio * (max_drop_frame_rate * delta_s);
	static float drop_accum = 0.0f; // or, like last_drop_delta_ms, maybe we store this somewhere too?
	drop_accum += drop_amt;
	bool drop_frame;
	if (drop_accum >= 1.0) {
		drop_frame = true;
		drop_accum -= 1.0;
	}
	if (drop_frame) {
		// ...
	} else {
		// ...
	}

Krzmbrzl · 2025-01-11T18:39:28Z

@jaggzh are you sure that the accumulation is happening in the audio backend's audio buffer? I would rather suspect that such buildup queues would happen in the jitter buffer. But that's just a gut feeling 🤷

Implementing some dropping mechanism (in whatever) form would in general be fine for me. However, before we do anything like that, I believe that it would be essential to first understand the cause of this issue. Dropping frames is a cure for the symptom but not for the disease...

If you have a setup that can reproduce this issue, I think it would already be very helpful if you could instrument Mumble with a couple of logging messages that log things like when an audio packet is received (and whether it has been received via TCP or UDP), what timestamp it carries and how much ms of audio data it contains.

Something that I could imagine is that client A is continuously transmitting data. Thus, the sent audio packets represent a continuous stream of audio data (i.e. no gaps in between). However, the packets are sent via the internet, where each packet will take a (slightly) different amount of time to travel from client A to client B. If client B now plays audio packet 1 but hasn't received audio packet 2 at the time it is finished with packet 1, there will be a gap (let's assume that packets 3+ are also not yet there so the audio backend can't skip ahead and just consider packet 2 missing). At the time packet 2 is received and played back, there has been some gap in the playback. However, this gap did not exist on the recording site of client A. That is, the gap is inserted into the real-time audio stream, shifting its end "into the future". In other words, this has now created lag.
If we assume we have N audio packets at a given time in the real-time audio stream, we can have (N-1) gaps. This means that playing back those N packets will take longer than it has taken to record them. However, since this is a continous stream, this extra time has nowhere to go and therefore necessarily causes further incoming packets to jam up.

Krzmbrzl · 2025-01-14T18:07:50Z

I believe that if my above explanation is indeed correct, the correct (and also rather simple) thing to do would be to change the value of the frame_number argument in a UDP voice packet from being a random sequential integer to be an actual timestamp (UTC in order to avoid timezone issues) as this is what receiving clients will use it for (in the internal JitterBuffer). Then only the processing in AudioOutputSpeech needs to be adapted to compare to the current timestamp of the local client and decide based on that whether the frame should be played or dropped.
We might want to always accept the first packet and establish the time difference between set timestamp and current timestamp as a zero baseline for evaluating whether or not a given packet is still on-time or not.

davidebeatrici · 2025-01-14T18:16:21Z

There is a problem: the time/clock is not guaranteed to be synced between clients, quite the opposite actually.

Krzmbrzl · 2025-01-14T18:21:54Z

That's why I suggest to use the first packet to approximate the time shift between client clocks. There might still be an issue with regards to clocks drifting apart but I guess we could just repeatedly update our baseline (or set it to something like a moving median or something).

MarioPL98 · 2025-01-15T02:55:51Z

Honestly, I'm not sure these ideas would work well. Please keep in mind that the OS that mumble runs on is not a real-time OS. I don't think any time synchronization would work at all. I propose the simplest solution: If there are more than X (some value) packets in the queue, then just drop most of them, without any time sync. The delay might be caused not only by slow/unstable network but also by slow/unstable OS. Eg. when there is some interrupt that freezes system for 1-2 seconds, like the famous Ryzen fTPM stutter issue, or any other short freeze.

Also, the issue happens not only with UDP but also with TCP mode, I forgot to mention in the main ticket.

MarioPL98 · 2025-01-15T03:08:09Z

As for testing purposes, I think writing something like kernel driver with very high interrupt frequency to "freeze" system a bit would be very good test. We could also try to implement simple resource starvation test tool that creates many cpu-hog threads, it should have similar effect but with less repeatability.

These tools with combination of some latency spikes of the network (maybe https://github.com/jagt/clumsy) would be a good baseline for debugging the issue.

jaggzh · 2025-02-27T07:15:29Z

My two computers do it constantly, so it's a good one to test the issue on (at least the one that's affecting these ones). Probably figuring out how to get logging in to the suspected areas (likely starting with Krzmbrzl's gut feeling) [in a way that presents information on the issue :) ...]
I'm looking into it, but if I can't get very far .. as annoying as it is, this part of the codebase is uncharted territory for me. :)

MarioPL98 added bug A bug (error) in the software triage This issue is waiting to be triaged by one of the project members labels Mar 3, 2024

Hartmnt added client audio and removed triage This issue is waiting to be triaged by one of the project members labels May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continuous transmission causes delays with unstable connection. #6354

Continuous transmission causes delays with unstable connection. #6354

MarioPL98 commented Mar 3, 2024 •

edited

Loading

MarioPL98 commented Mar 7, 2024

loliconRoot commented Oct 22, 2024

jaggzh commented Nov 3, 2024

jaggzh commented Nov 20, 2024

Hartmnt commented Nov 21, 2024 •

edited

Loading

jaggzh commented Dec 26, 2024

Krzmbrzl commented Jan 11, 2025

Krzmbrzl commented Jan 14, 2025

davidebeatrici commented Jan 14, 2025

Krzmbrzl commented Jan 14, 2025 •

edited

Loading

MarioPL98 commented Jan 15, 2025

MarioPL98 commented Jan 15, 2025 •

edited

Loading

jaggzh commented Feb 27, 2025

Continuous transmission causes delays with unstable connection. #6354

Continuous transmission causes delays with unstable connection. #6354

Comments

MarioPL98 commented Mar 3, 2024 • edited Loading

Description

Steps to reproduce

Mumble version

Mumble component

OS

Reproducible?

Additional information

MarioPL98 commented Mar 7, 2024

loliconRoot commented Oct 22, 2024

jaggzh commented Nov 3, 2024

jaggzh commented Nov 20, 2024

Hartmnt commented Nov 21, 2024 • edited Loading

jaggzh commented Dec 26, 2024

Krzmbrzl commented Jan 11, 2025

Krzmbrzl commented Jan 14, 2025

davidebeatrici commented Jan 14, 2025

Krzmbrzl commented Jan 14, 2025 • edited Loading

MarioPL98 commented Jan 15, 2025

MarioPL98 commented Jan 15, 2025 • edited Loading

jaggzh commented Feb 27, 2025

MarioPL98 commented Mar 3, 2024 •

edited

Loading

Hartmnt commented Nov 21, 2024 •

edited

Loading

Krzmbrzl commented Jan 14, 2025 •

edited

Loading

MarioPL98 commented Jan 15, 2025 •

edited

Loading