Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audible "tick" is produced at the beginning (or end maybe?) of each note #37

Open
kevenwyld opened this issue Oct 21, 2024 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@kevenwyld
Copy link

Describe the bug

Definitely not a big deal, but figured I'd report it since I noticed it. There is an audible "tick" is produced at the beginning (or end maybe?) of each note. You can see it on a spectrogram but also hear it depending on the pitch, distortion, etc.

To reproduce

  1. Run linuxwave -n 27 -o bass.wav
  2. Run ffmpeg -i bass.wav -lavfi showspectrumpic=s=1920x1080:mode=separate spectrogram.png
  3. View spectrogram.png

Expected behavior

A smooth transition between tones or notes

Screenshots / Logs

spectrogram

Software information

  • Operating system: ArchLinux 6.6.57-1-lts
  • Zig version: I don't know, I installed this from extra
  • Project version: 2.0

Additional context

This is a totally awesome project! Thank you for making it!

It's possible this is unique to my machine. I have only tested on one device.

@kevenwyld kevenwyld added the bug Something isn't working label Oct 21, 2024
@kevenwyld
Copy link
Author

As far as I can tell this seems to be caused by harsh transitions between each note because the waveforms intersect abruptly rather than decaying and overlapping with the next note.

Screenshot_2024-10-21_09-15-44

@orhun orhun changed the title audible "tick" is produced at the beginning (or end maybe?) of each note Audible "tick" is produced at the beginning (or end maybe?) of each note Oct 21, 2024
@orhun
Copy link
Owner

orhun commented Oct 21, 2024

Hello, thanks for the report and kind words!

ffmpeg -i bass.wav -lavfi showspectrumpic=s=1920x1080:mode=separate spectrogram.png

This is quite cool, I didn't know ffmpeg could do that!

It's possible this is unique to my machine. I have only tested on one device.

Would love to hear it and compare it with my results if you can share it :)

As far as I can tell this seems to be caused by harsh transitions between each note because the waveforms intersect abruptly rather than decaying and overlapping with the next note.

Yup, that sounds correct. Something is going on in the wav.zig module...

@kevenwyld
Copy link
Author

Would love to hear it and compare it with my results if you can share it :)

Here's a file I generated. There was some post processing on this one to resample, and get rid of the DC offset. The original is included in the zip too:

bass_44100.wav.zip

post processing steps:

ffmpeg -i bass.wav -ac 1 -ar 44100 bass_44100.wav
wavegain -y bass_44100.wav

The tick is in the original too but none of my equipment plays nice with the DC offset so I have to post process to play them.

My friend sent me an interesting video about this issue and creating a window function to deal with it. I was messing around with it but I don't quite have the understanding to implement anything yet. https://youtu.be/PjKlMXhxtTM?si=JQPNJWQybmlZVTY5&t=742

@orhun
Copy link
Owner

orhun commented Oct 24, 2024

bass_44100.wav.zip

Yeah I see. That's an issue that I was aware of but not able to fix so far :/ It's probably related to the encoding of the samples...

My friend sent me an interesting video about this issue and creating a window function to deal with it. I was messing around with it but I don't quite have the understanding to implement anything yet.

That's a good reference - your friend has a big brain!

I tried implementing the Hann window function in #38 - it makes things a bit better. But I think it can be improved. Can you take a look? 🙂

@kevenwyld
Copy link
Author

Thanks! I built that PR and tried it out. It does get rid of the tick by fading down to zero between each note. Unfortunately this has two impacts on the audio that may not be desirable:

  1. It really changes the "vibe" lol. The original sounds pretty cool, like you'd expect a robot from a 90s sci-fi movie to sound if you asked it to play you a song. With the Hann window function it has a much different tone, something more mellow and I tend to fall into each pause. This is subjective of course so this is only my opinion, just the first thing I noticed.
  2. Both the normalizing filters I've tried on the hann version of the output fail to remove the DC offset. They move the whole waveform down, but because the window function it self is offset it becomes "differently offset". See the screenshot:

Here they are not normalized
Screenshot_2024-10-28_00-31-47

These have both been normalized using tenacity's builtin normalizing filter with DC offset removal selected.
Screenshot_2024-10-28_00-28-54

It may be that in order to implement a window function the offset will need to be removed first. Sorry that this ended up being so complicated.

I tried playing with some of the constants in the function but couldn't find anything that improves the situation.

@orhun
Copy link
Owner

orhun commented Oct 29, 2024

Thanks for sharing your findings. I 100% agree that the window function that I applied changes the vibe a lot. It definitely needs some tweaking and playing around.

It may be that in order to implement a window function the offset will need to be removed first. Sorry that this ended up being so complicated.

I'm not sure if I understood what you mean about DC offset fully. I'm not sure how that should be possible 🤔

@kevenwyld
Copy link
Author

kevenwyld commented Oct 29, 2024

I'm not sure if I understood what you mean about DC offset fully. I'm not sure how that should be possible 🤔

DC offset may be something requiring another github issue as it's not really related to this ticking sound, I just wasn't sure until now that it impacts the solution to the ticking. I will try, with my limited understanding, to explain what I think is causing this, and the impacts of it:

DC offset or DC bias in audio means that the mean value of the waveform is either above or below zero. Since this term comes from analog circuits, and we are talking about sound over a speaker, we can think of the speaker at "rest" being zero. When a correctly DC balanced sine wave (mean 0) is reproduced the speaker moves outwards from zero "rest" position to the maximum positive peak, and then inwards to the (negative) minimum trough of the sine wave, creating the pressure changes we perceive as sound.

The audio produced by this program has an entirely positive DC bias, so the entire wave form, both the peak and the trough of the sine wave is above zero. So when played over a speaker the speaker is forced outwards at all times, effectively moving the zero position to half the wave height, and requiring the amplifier to hold the speaker in an outward position at all times, never passing the rest position.

The reason I think this is an issue here is because the Hann function is attempting to taper the wave to zero, but because zero is actually the trough of the wave the mean ends up being positive still. So when one tries to apply a zero mean (normalize filter with DC offset correction) the whole waveform shifts down so that the mean is zero, but the waveform is shaped like a positive hump, or hill, so that type of simple correction doesn't work correctly and part of the wave is still offset for each note.

Here I've taken a screenshot of the Hann function on the top, the output of the stable version of the program in the middle, and a waveform with no DC offset on the bottom. You can see on the scale on the left that the rest position is offset by the wave height for the first two.

Screenshot_2024-10-29_11-48-45

Now, I'm pretty sure that this is happening because when you write out each note you somehow only end up with positive numbers for the variable that stores the waveform, I have stared at the gen.zig Generator function for longer than I care to admit and I cant figure it out, but this type of programming, as well as this language specifically, is not my area of expertise.

I hope you don't mind that this got a little long. I had a lot of fun researching all this, and I'd really like to keep trying to understand how to improve it. Thanks for reading! =] . And no worries if you have other priorities and don't want to continue diving into this.

EDIT: To clarify, the speaker example is oversimplified. I think most amplifiers don't reproduce a DC offset like this, it gets filtered out somewhere in the signal path. The effectiveness of that filtering and how things sound after it depend on the amplifier. It turns out that my DAC+Amp that I use for headphones on my desk is horrifically bad at this and will even shut off if I play these files too loud.

@orhun
Copy link
Owner

orhun commented Nov 3, 2024

So when played over a speaker the speaker is forced outwards at all times, effectively moving the zero position to half the wave height, and requiring the amplifier to hold the speaker in an outward position at all times, never passing the rest position.

That's super interesting. I always thought there is something wrong with the generated file when I play it on a speaker but was not able to pinpoint it. Maybe I felt that happening somehow 🤔

Now, I'm pretty sure that this is happening because when you write out each note you somehow only end up with positive numbers for the variable that stores the waveform, I have stared at the gen.zig Generator function for longer than I care to admit and I cant figure it out, but this type of programming, as well as this language specifically, is not my area of expertise.

Yes, IIRC we always end up with positive values and encode them as notes using the equal temperament. The generate function in gen.zig definitely needs more explanation and I'm happy to walk you through the code if you have any specific questions. I like the rabbit hole you're digging there :)

I hope you don't mind that this got a little long. I had a lot of fun researching all this, and I'd really like to keep trying to understand how to improve it. Thanks for reading! =] . And no worries if you have other priorities and don't want to continue diving into this.

No worries at all! I hope you don't mind my disappearances on different intervals though.

Btw do you have a blog? I would love to read more about it if you go ahead and put up a deep dive article there. Maybe something in this format.

As for fixing this issue, I think we need to dive a bit more into the generate function and figure out what's happening in there I guess.

@arda-guler
Copy link

screenshot

Don't mind the professional artwork.

If the program knew where a note transition would be placed slightly ahead of time, a short low-pass filter with a high cutoff frequency (only filtering the tick) could soften the wave transition. Ideally, the cutoff freq should lower (or attenuation should rise), come to a min. at note transition (or attenuation should come to a max.), and rise again (or attenuation should lower) until it's the original waveform again, all within some milliseconds.

I have no idea in a bloody spiraling hell how one would implement this, however... and this feels more like some mathematical perversion more than problem solving. But! Computational cost for this shouldn't be high (but don't quote me on that) as there are digital audio workstation software that can process several low-pass filters in near-real-time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants