Fix parallel access crashes and misbehavior #136

jmid · 2024-04-16T14:50:44Z

Parallel usage is memory unsafe (read: may crash) as documented in #120, ocaml/ocaml#11607, and ocaml/ocaml#13046.

This PR goes for the simplest possible fix: adding a single global lock by dusting off the first commit of https://github.com/dra27/flexdll/tree/sledgehammer and suitable rebasing, renaming, and error handling.
Author credit thus goes to @dra27 - any errors are mine.

For the error handling, I've tried to make it fit with @shym's TLS-based error handling from #112.
I'm unsure how to test these error code paths though without explicitly mocking with the source code to create an invalid lock handle.

With the fix

the ocaml/ocaml testsuite passes incl. the disabled tests/lib-dynlink-domains test from Temporarily disable the lib-dynlink-domains test on Windows ocaml#11607
the reproducer from Parallel Dynlink usage under Cygwin+MinGW is unsafe ocaml#13046 also passes
the Dynlink stress test from multicoretests passes

(these have been tested under MinGW in a Cygwin-shell)

flexdll.c

MisterDA · 2024-04-17T20:15:12Z

flexdll.c

+      goto again;
+    }
+  } else {
+    if (WaitForSingleObject(units_mutex, INFINITE) == WAIT_FAILED) {


Suggested change

if (WaitForSingleObject(units_mutex, INFINITE) == WAIT_FAILED) {

if (WaitForSingleObject(units_mutex, INFINITE) != WAIT_OBJECT_0) {

This would cover the improbable (impossible?) case of the mutex having been abandoned.
https://learn.microsoft.com/en-us/windows/win32/sync/using-mutex-objects

I ended up reverting this one.

WAIT_ABANDONED does not offer GetLastError-integration, so I started adding an explicit error message case for it - only to then hit it during testing.

I therefore suppose these two cases were written to handle both WAIT_OBJECT_0 and WAIT_ABANDONED as success-cases (WAIT_TIMEOUT should not happen with INFINITE...)

My reasoning I think is that WAIT_ABANDONED should be treated like success?

(i.e. I agree with @jmid's assessment!)

MisterDA · 2024-04-17T20:15:42Z

flexdll.c

+  err = get_tls_error(TLS_ERROR_NOP);
+  if(err == NULL) return NULL;
+
+  if (WaitForSingleObject(units_mutex, INFINITE) == WAIT_FAILED) {


Suggested change

if (WaitForSingleObject(units_mutex, INFINITE) == WAIT_FAILED) {

if (WaitForSingleObject(units_mutex, INFINITE) != WAIT_OBJECT_0) {

(same for this one)

flexdll.c

MisterDA

I've tested an finally understood why the code is correct, thanks Jan! I think this is good to go.
Minor suggestion still.

flexdll.c

Co-authored-by: Antonin Décimo <[email protected]>

jmid · 2024-05-16T12:43:42Z

I've addressed the last minor comment, added a CHANGES entry, and rebased on master.

dra27 · 2024-06-26T08:22:45Z

Thank you both for your work on this, and sorry for taking quite so long to merge it!

MisterDA reviewed Apr 17, 2024

View reviewed changes

MisterDA reviewed Apr 18, 2024

View reviewed changes

flexdll.c Outdated Show resolved Hide resolved

jmid force-pushed the single-global-lock-par-fix branch from 217db5d to 67efa7d Compare April 18, 2024 17:06

MisterDA approved these changes May 16, 2024

View reviewed changes

flexdll.c Outdated Show resolved Hide resolved

dra27 and others added 9 commits May 16, 2024 14:31

Single global lock

6a7cdfb

Rename mutex

ed96937

Add mutex error handling

aad987e

Update comment to reflect the change

9e23f46

Review: spacing

80a9790

Co-authored-by: Antonin Décimo <[email protected]>

Review: add CreateMutex error handling

439fe51

Review: remove volatile

ffb8ce2

Review: avoid needless block

a824e52

Add CHANGES entry

4950924

jmid force-pushed the single-global-lock-par-fix branch from 47480bc to 4950924 Compare May 16, 2024 12:35

MisterDA approved these changes May 16, 2024

View reviewed changes

dra27 merged commit 5719f5a into ocaml:master Jun 26, 2024
1 check passed

jmid deleted the single-global-lock-par-fix branch June 26, 2024 08:53

jmid mentioned this pull request Dec 13, 2024

Reenable Dynlink test under Windows ocaml-multicore/multicoretests#488

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parallel access crashes and misbehavior #136

Fix parallel access crashes and misbehavior #136

jmid commented Apr 16, 2024

MisterDA Apr 17, 2024

jmid Apr 18, 2024

dra27 Jun 26, 2024

dra27 Jun 26, 2024

MisterDA Apr 17, 2024 •

edited

Loading

jmid Apr 18, 2024

MisterDA left a comment

jmid commented May 16, 2024

dra27 commented Jun 26, 2024

	if (WaitForSingleObject(units_mutex, INFINITE) == WAIT_FAILED) {
	if (WaitForSingleObject(units_mutex, INFINITE) != WAIT_OBJECT_0) {

Fix parallel access crashes and misbehavior #136

Fix parallel access crashes and misbehavior #136

Conversation

jmid commented Apr 16, 2024

MisterDA Apr 17, 2024

Choose a reason for hiding this comment

jmid Apr 18, 2024

Choose a reason for hiding this comment

dra27 Jun 26, 2024

Choose a reason for hiding this comment

dra27 Jun 26, 2024

Choose a reason for hiding this comment

MisterDA Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

jmid Apr 18, 2024

Choose a reason for hiding this comment

MisterDA left a comment

Choose a reason for hiding this comment

jmid commented May 16, 2024

dra27 commented Jun 26, 2024

MisterDA Apr 17, 2024 •

edited

Loading