Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probably problem with SSO for std::string. #36

Closed
itwasntme-mk opened this issue Sep 3, 2020 · 6 comments
Closed

Probably problem with SSO for std::string. #36

itwasntme-mk opened this issue Sep 3, 2020 · 6 comments
Labels

Comments

@itwasntme-mk
Copy link

Code:

    std::string str("~~~~~~~~~~~~~~~");

    for (int i = 0; i != 1000000; ++i)
      g_log.info("Info line: %s @%d", str, i);

Result in:

munmap_chunk(): invalid pointer
Aborted

Randomly I saw also:

munmap_chunk(): invalid size

If str has more than 15 characters problem doesn't appear. @mattiasflodin Could you take a look at that? Probably it is similar to #35 (also short string).

@mattiasflodin
Copy link
Owner

Hi, just wanted to let you know I've reproduced this and I'll have a look at it. Thanks for the report.

@serpent7776
Copy link
Contributor

Is there any progress on this?

I noticed that increasing input_buffer_capacity when creating basic_log instance reduces chance of getting crash.

@mattiasflodin Are you sure using relaxed atomics is ok here? https://github.com/mattiasflodin/reckless/blob/v3.0.2/reckless/include/reckless/detail/platform.hpp#L76
Wouldn't it be better to use std::atomic?
The TODO note there is in particular concerning.

@mopleen
Copy link

mopleen commented Dec 9, 2020

Maybe this demo will be useful https://github.com/mopleen/reckless_demo_crash
What's interesting is that the crash does not happen on the system running libc 2.17

Also, logging only long strings does not crash either. Short string is what's triggering the problem.

@mattiasflodin
Copy link
Owner

mattiasflodin commented Apr 10, 2021

@serpent7776 @mopleen Thank you for your patience with this bug and your efforts to provide reproduction demos and inspecting the code. I have been busy with some other projects so this had to be put on hold for a while. Anyway, I believe I have figured out what is going wrong. The usual suspect would of course be multithreading and incorrect use of atomics, but in fact this error is not related to that.

The problem lies in the ring buffer, which is implemented by mapping multiple virtual addresses to the same physical memory in an attempt to avoid some wraparound checks. basic_string in libstdc++ has an optimization for strings of 16 characters or less, in which it stores the data in a local array instead of allocating memory. When the string is destroyed, it performs this comparison to determine whether the memory should be deallocated or not:

_M_dispose()
{
    if (!_M_is_local())
        _M_destroy(_M_allocated_capacity);
}

_M_is_local() operates by comparing the buffer pointer to the address of the internal array. If they don't match, it assumes that the memory was allocated and attempts to deallocate it.

For some reason that I'm currently looking into, the same virtual address is not being used when the object is destroyed as when it is created (even though they refer to the same physical memory), which causes _M_is_local to return false and basic_string then attempts to free a buffer that was never allocated.

It may turn out that my ring buffer is too clever for its own good and that I need to change its implementation. I hope not, because that could make the performance of pushing very large objects unintuitive. But so far I believe there is just a bug in some pointer arithmetic related to the buffer.

mattiasflodin pushed a commit that referenced this issue Apr 10, 2021
basic_log::output_worker calculates frame positions by taking the
"front" address (current read pointer) from mpsc_ring_buffer and then
advancing that until it reaches the end of the batch. However, if the
batch size is large enough to go past the end of the ring buffer then no
wrap around is applied to the frame pointer.

This isn't strictly a buffer overrun because, due to the nature of the
ring buffer implementation, the same physical memory is mapped in a
second time at the end and we are in fact meant to go past the end in
situations where objects overlap the end. However, we can run into
trouble when an object was constructed in the first virtual memory-map
and then accessed through the second virtual-memory map. For plain old
data it works fine, but if the object has code that depends on the
object's this-pointer staying consistent (a reasonable expectation) then
formatting or destroying the object can lead to undefined behavior.

This surfaced as github issue #36 in which the libstdc++ implementation
of std::string had optimizations that would make use of the address of a
class member to determine which deallocation strategy must be used.

The fix is fairly simple: instead of using the frame pointer as loop
invariant we use the ring buffer's 64-bit position to determine when the
end is reached, and whenever we need to use the frame pointer we make
sure that it is consistently wrapped to always start in the first
virtual-memory block.
mattiasflodin added a commit that referenced this issue Apr 10, 2021
basic_log::output_worker calculates frame positions by taking the
"front" address (current read pointer) from mpsc_ring_buffer and then
advancing that until it reaches the end of the batch. However, if the
batch size is large enough to go past the end of the ring buffer then no
wrap around is applied to the frame pointer.

This isn't strictly a buffer overrun because, due to the nature of the
ring buffer implementation, the same physical memory is mapped in a
second time at the end and we are in fact meant to go past the end in
situations where objects overlap the end. However, we can run into
trouble when an object was constructed in the first virtual memory-map
and then accessed through the second virtual-memory map. For plain old
data it works fine, but if the object has code that depends on the
object's this-pointer staying consistent (a reasonable expectation) then
formatting or destroying the object can lead to undefined behavior.

This surfaced as github issue #36 in which the libstdc++ implementation
of std::string had optimizations that would make use of the address of a
class member to determine which deallocation strategy must be used.

The fix is fairly simple: instead of using the frame pointer as loop
invariant we use the ring buffer's 64-bit position to determine when the
end is reached, and whenever we need to use the frame pointer we make
sure that it is consistently wrapped to always start in the first
virtual-memory block.
@mattiasflodin
Copy link
Owner

Resolved in v3.0.3.

@serpent7776
Copy link
Contributor

Seems it's working fine now, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants