bgpd: rfapi: fix mem leak when killed #18045

paulzlabn · 2025-02-06T16:38:46Z

Something changed in the bgpd termination sequence, leading to rfapi objects not being freed via their usual timed expiration pathway. This change frees rfapi peer BPIs immediately instead of expiring via timer when bgp is terminating.

…ng timer Signed-off-by: G. Paul Ziemba <[email protected]>

donaldsharp · 2025-02-06T16:51:48Z

as a note for something that needs exploration in the rfapi code -> There is a pattern of setting the vnc.import.timer thread pointer to NULL and then calling event_add_timer.

This pattern has caused problems in the past where memory allocated and pointed to by the event/thread get's leaked or worse when that event hasn't expired but the data structures that are used by that event/thread get yanked out from underneath it(say some form of no command) and the underlying code has no way to get ahold of that event/thread since the pointer to it in it's space is gone. This will cause crashes! I don't necessarily think this should hold up this memory leak problem but it is a pattern that the rfapi code should be aware of and fix.

donaldsharp · 2025-02-06T16:52:12Z

	assert(bpi->extra);
	if (lifetime > UINT32_MAX / 1001) {
		/* sub-optimal case, but will probably never happen */
		bpi->extra->vnc->vnc.import.timer = NULL;
		event_add_timer(bm->master, timer_service_func, wcb, lifetime,
				&bpi->extra->vnc->vnc.import.timer);
	} else {
		static uint32_t jitter;
		uint32_t lifetime_msec;

		/*
		 * the goal here is to spread out the timers so they are
		 * sortable in the skip list
		 */
		if (++jitter >= 1000)
			jitter = 0;

		lifetime_msec = (lifetime * 1000) + jitter;

		bpi->extra->vnc->vnc.import.timer = NULL;
		event_add_timer_msec(bm->master, timer_service_func, wcb,
				     lifetime_msec,
				     &bpi->extra->vnc->vnc.import.timer);
	}

here is the code I am talking about

paulzlabn · 2025-02-06T17:50:33Z

setting the vnc.import.timer thread pointer to NULL and then calling event_add_timer.

Interesting. Even the original code from 2009 had an equivalent pattern import.timer = thread_add_timer(...).
It would probably be good to replace setting-to-NULL with an assert and then see what falls out. I'll have a look at that.

donaldsharp · 2025-02-06T18:06:21Z

I've definately fixed a few bugs from code using that pattern from taht time period.

donaldsharp · 2025-02-06T18:29:56Z

==3115346==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f27348b4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x7f27342910a9 in qcalloc lib/memory.c:106
    #2 0x55a4670295b7 in rfapiRibStartTimer bgpd/rfapi/rfapi_rib.c:328
    #3 0x55a467031168 in process_pending_node bgpd/rfapi/rfapi_rib.c:1371
    #4 0x55a467031c4f in rib_do_callback_onepass bgpd/rfapi/rfapi_rib.c:1465
    #5 0x55a467032123 in rfapiRibDoQueuedCallback bgpd/rfapi/rfapi_rib.c:1507
    #6 0x7f27343cc2aa in work_queue_run lib/workqueue.c:282
    #7 0x7f27343a3ad9 in event_call lib/event.c:1984
    #8 0x7f2734262141 in frr_run lib/libfrr.c:1246
    #9 0x55a466c3ce19 in main bgpd/bgp_main.c:557
    #10 0x7f2733629d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

SUMMARY: AddressSanitizer: 40 byte(s) leaked in 1 allocation(s).
::::::::::::::
./bgp_rfapi_basic_sanity.test_bgp_rfapi_basic_sanity/r3.asan.bgpd.3094674
::::::::::::::

=================================================================
==3094674==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f0ba40b4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x7f0ba3a910a9 in qcalloc lib/memory.c:106
    #2 0x5575d77325b7 in rfapiRibStartTimer bgpd/rfapi/rfapi_rib.c:328
    #3 0x5575d773a168 in process_pending_node bgpd/rfapi/rfapi_rib.c:1371
    #4 0x5575d773ac4f in rib_do_callback_onepass bgpd/rfapi/rfapi_rib.c:1465
    #5 0x5575d773b123 in rfapiRibDoQueuedCallback bgpd/rfapi/rfapi_rib.c:1507
    #6 0x7f0ba3bcc2aa in work_queue_run lib/workqueue.c:282
    #7 0x7f0ba3ba3ad9 in event_call lib/event.c:1984
    #8 0x7f0ba3a62141 in frr_run lib/libfrr.c:1246
    #9 0x5575d7345e19 in main bgpd/bgp_main.c:557
    #10 0x7f0ba2e29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

does this need the same fix?

paulzlabn · 2025-02-06T18:47:32Z

#2 0x5575d77325b7 in rfapiRibStartTimer bgpd/rfapi/rfapi_rib.c:328
#3 0x5575d773a168 in process_pending_node bgpd/rfapi/rfapi_rib.c:1371
#4 0x5575d773ac4f in rib_do_callback_onepass bgpd/rfapi/rfapi_rib.c:1465
#5 0x5575d773b123 in rfapiRibDoQueuedCallback bgpd/rfapi/rfapi_rib.c:1507


does this need the same fix?

I think this will be a different fix, because the allocation here didn't occur while shutting down. But I suspect that both this case as well as the case addressed by this PR were triggered by some change in the ordering of FRR infrastructure cleanup during shutdown (which revealed some latent omission in the rfapi cleanup routine).

paulzlabn · 2025-02-06T22:24:53Z

ci:rerun

frrbot bot added the bgp label Feb 6, 2025

bgpd: rfapi: if BGP is terminating, free peer BPIs now instead of usi…

125a480

…ng timer Signed-off-by: G. Paul Ziemba <[email protected]>

paulzlabn force-pushed the ziemba/250205-rfapi-mem-cleanup branch from d8bea53 to 125a480 Compare February 6, 2025 16:46

github-actions bot added size/S master labels Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bgpd: rfapi: fix mem leak when killed #18045

bgpd: rfapi: fix mem leak when killed #18045

paulzlabn commented Feb 6, 2025

donaldsharp commented Feb 6, 2025

donaldsharp commented Feb 6, 2025

paulzlabn commented Feb 6, 2025

donaldsharp commented Feb 6, 2025

donaldsharp commented Feb 6, 2025

paulzlabn commented Feb 6, 2025

paulzlabn commented Feb 6, 2025

bgpd: rfapi: fix mem leak when killed #18045

Are you sure you want to change the base?

bgpd: rfapi: fix mem leak when killed #18045

Conversation

paulzlabn commented Feb 6, 2025

donaldsharp commented Feb 6, 2025

donaldsharp commented Feb 6, 2025

paulzlabn commented Feb 6, 2025

donaldsharp commented Feb 6, 2025

donaldsharp commented Feb 6, 2025

paulzlabn commented Feb 6, 2025

paulzlabn commented Feb 6, 2025