You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discovered by the CI, RClusterPool::WaitFor() can throw an R__ASSERT on receiving a null pointer from the cluster future. This can happen under the following condition:
The main thread triggers cluster $k$ for background loading
Another request to GetCluster() removes cluster $k$ from the provides set. If $k$ is not yet done loading by the I/O thread, this will set the fIsExpired flag on the cluster
The I/O thread, upon having loaded cluster $k$, sees the fIsExpired flag (under lock of the work queue). Consequently, it stops short of decompressing the cluster and sets the cluster promise to null (not under lock)
Another call to GetCluster() can be scheduled just between the test for fIsExpired and setting the cluster promise to null in the I/O thread. In this case, GetCluster() mistakenly assumes that the requested cluster will be provided by the I/O thread (where in fact the I/O thread returns null).
The fix seems to be to have both under lock, the test for fIsExpired and setting the cluster promise to null.
Reproducer
The RandomAccess unit test sometimes triggers the race.
ROOT version
master
Installation method
n/a
Operating system
n/a
Additional context
No response
The text was updated successfully, but these errors were encountered:
We can consider dropping the "discard/expire" signal entirely. Since the decompression does not take place anymore in the I/O thread, it is questionable if the optimization gains much.
Check duplicate issues.
Description
As discovered by the CI,
RClusterPool::WaitFor()
can throw anR__ASSERT
on receiving a null pointer from the cluster future. This can happen under the following condition:GetCluster()
removes clusterprovides
set. IffIsExpired
flag on the clusterfIsExpired
flag (under lock of the work queue). Consequently, it stops short of decompressing the cluster and sets the cluster promise to null (not under lock)GetCluster()
can be scheduled just between the test forfIsExpired
and setting the cluster promise to null in the I/O thread. In this case,GetCluster()
mistakenly assumes that the requested cluster will be provided by the I/O thread (where in fact the I/O thread returns null).The fix seems to be to have both under lock, the test for
fIsExpired
and setting the cluster promise to null.Reproducer
The
RandomAccess
unit test sometimes triggers the race.ROOT version
master
Installation method
n/a
Operating system
n/a
Additional context
No response
The text was updated successfully, but these errors were encountered: