Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MinGW fails with access violation writing error #1818

Closed
StrikerRUS opened this issue Nov 2, 2018 · 16 comments
Closed

MinGW fails with access violation writing error #1818

StrikerRUS opened this issue Nov 2, 2018 · 16 comments

Comments

@StrikerRUS
Copy link
Collaborator

Just moving my comment from old PR to a separate issue for the discussion and easier searching for users.

While working on that PR, I found weird bug under MinGW. The same symptoms were described in #1475. When I tried to run all examples by pytest %APPVEYOR_BUILD_FOLDER%\tests instead of pytest %APPVEYOR_BUILD_FOLDER%\tests\python_package_test, two C_API tests were passed succesfully, but all other tests failed with OSError: exception: access violation writing 0xFFFFFFFF93400000. At present this bug is avoided by running only Python tests (without C_API). I suppose, it's somehow connected to that the same lib_lightgbm.dll is loaded twice from different places (from installation directory for C_API tests and from python-package directory for Python tests).

Environment info

Operating System: Windows

Compiler: any MinGW-w64

Error message

E       OSError: exception: access violation writing 0xFFFFFFFF93160000

Full log: https://ci.appveyor.com/project/guolinke/lightgbm/builds/20010549/job/ewn044xw9dja8g9y

Steps to reproduce

  1. Modify .appveyor.yml file:
test_script:
- - pytest %APPVEYOR_BUILD_FOLDER%\tests\python_package_test
+ - pytest %APPVEYOR_BUILD_FOLDER%\tests
@guolinke
Copy link
Collaborator

guolinke commented Nov 6, 2018

@StrikerRUS I am not sure about the root cause, maybe some build flags for mingw are wrong ?

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Nov 6, 2018

@guolinke To be honest, I'm not familiar with this, I remember only I added static linking to eliminate confusion with conda's libraries:

https://github.com/Microsoft/LightGBM/blob/a0efb07bb8ef9d8a80f246a933c623e253a8a0a3/CMakeLists.txt#L119-L121

Maybe there should more static linking?..

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Nov 8, 2018

All tests are passed when -O3 flag is removed.

diff --git a/.appveyor.yml b/.appveyor.yml
index 6ea6da0..6f8daf2 100644
--- a/.appveyor.yml
+++ b/.appveyor.yml
@@ -42,7 +42,7 @@ build_script:
     python setup.py install)

 test_script:
-  - pytest %APPVEYOR_BUILD_FOLDER%\tests\python_package_test
+  - pytest %APPVEYOR_BUILD_FOLDER%\tests
   - cd %APPVEYOR_BUILD_FOLDER%\examples\python-guide
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 057302b..1540cd4 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -113,7 +113,7 @@ if(USE_HDFS)
 endif(USE_HDFS)

 if(UNIX OR MINGW OR CYGWIN)
-    SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -pthread -O3 -Wextra -Wall -Wno-ignored-attributes -Wno-unknown-pragmas -Wno-return-type")
+    SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -pthread -Wextra -Wall -Wno-ignored-attributes -Wno-unknown-pragmas -Wno-return-type")
 endif()

Log: https://ci.appveyor.com/project/guolinke/lightgbm/builds/20151191/job/5gi6ebjpevehk06n

UPD:
Replacing -O3 with -O2 makes no sense - the same error.

Log: https://ci.appveyor.com/project/guolinke/lightgbm/builds/20152421/job/vhauwfo88v7322gq

UPD2:
Replacing -O3 with -O1 makes no sense too.

Log: https://ci.appveyor.com/project/guolinke/lightgbm/builds/20154024/job/38212pryi7mhwt2s

@StrikerRUS
Copy link
Collaborator Author

Linking #1588.

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Nov 10, 2018

The flag which causes a crash is -fipa-reference-addressable (https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fipa-reference-addressable).

@guolinke
Copy link
Collaborator

Thanks @StrikerRUS so much. is this problem solved ?

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Nov 13, 2018

@guolinke Happy to help!

Unfortunately, I can't find any information about the -fipa-reference-addressable flag, except the official one-line description. In addition, I'm not sure, but it seems that there is no way to specify O3 except one flag...
My intuition is that we should find error-producing variable(s) and protect them from the optimization.
Maybe you know what can be done further?

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Nov 24, 2018

I was wrong with the link above - it points to the latest master gcc. So, -fipa-reference-addressable is not a cause of crashes.

I installed MinGW-w64 with gcc version 8.1.0 locally and discovered that I'm able to reproduce the issue.
Then I found the right list of flags for 8.1.0 version (https://gcc.gnu.org/onlinedocs/gcc-8.1.0/gcc/Optimize-Options.html) and tried to specify them directly.
The result was that -O1 leads to a crash, but setting all flags from docs for -O1 allows to pass all tests successfully.
Also, I found this: https://stackoverflow.com/a/1782219, https://gcc.gnu.org/wiki/FAQ#Is_-O1_.28-O2.2C-O3.2C_-Os_or_-Og.29_equivalent_to_individual_-foptimization_options.3F.

UPD:
Tried this answer: https://stackoverflow.com/a/6454659

touch empty.c
gcc -O1 -S -fverbose-asm empty.c
cat empty.s

This produces the following file on my machine:

	.file	"empty.c"
 # GNU C17 (x86_64-posix-seh-rev0, Built by MinGW-W64 project) version 8.1.0 (x86_64-w64-mingw32)
 #	compiled by GNU C version 8.1.0, GMP version 6.1.2, MPFR version 4.0.1, MPC version 1.1.0, isl version isl-0.18-GMP

 # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
 # options passed: 
 # -iprefix D:/Program Files/mingw-w64/x86_64-8.1.0-posix-seh-rt_v6-rev0/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.1.0/
 # -D_REENTRANT empty.c -mtune=core2 -march=nocona -O1 -fverbose-asm
 # options enabled:  -faggressive-loop-optimizations
 # -fasynchronous-unwind-tables -fauto-inc-dec -fbranch-count-reg
 # -fchkp-check-incomplete-type -fchkp-check-read -fchkp-check-write
 # -fchkp-instrument-calls -fchkp-narrow-bounds -fchkp-optimize
 # -fchkp-store-bounds -fchkp-use-static-bounds
 # -fchkp-use-static-const-bounds -fchkp-use-wrappers
 # -fcombine-stack-adjustments -fcommon -fcompare-elim -fcprop-registers
 # -fdefer-pop -fdelete-null-pointer-checks -fdwarf2-cfi-asm
 # -fearly-inlining -feliminate-unused-debug-types -fforward-propagate
 # -ffp-int-builtin-inexact -ffunction-cse -fgcse-lm -fgnu-runtime
 # -fgnu-unique -fguess-branch-probability -fident -fif-conversion
 # -fif-conversion2 -finline -finline-atomics
 # -finline-functions-called-once -fipa-profile -fipa-pure-const
 # -fipa-reference -fira-hoist-pressure -fira-share-save-slots
 # -fira-share-spill-slots -fivopts -fkeep-inline-dllexport
 # -fkeep-static-consts -fleading-underscore -flifetime-dse
 # -flto-odr-type-merging -fmath-errno -fmerge-constants
 # -fmerge-debug-strings -fmove-loop-invariants -fomit-frame-pointer
 # -fpeephole -fpic -fplt -fprefetch-loop-arrays -freg-struct-return
 # -freorder-blocks -fsched-critical-path-heuristic
 # -fsched-dep-count-heuristic -fsched-group-heuristic -fsched-interblock
 # -fsched-last-insn-heuristic -fsched-rank-heuristic -fsched-spec
 # -fsched-spec-insn-heuristic -fsched-stalled-insns-dep -fschedule-fusion
 # -fsemantic-interposition -fset-stack-executable -fshow-column
 # -fshrink-wrap -fshrink-wrap-separate -fsigned-zeros
 # -fsplit-ivs-in-unroller -fsplit-wide-types -fssa-backprop -fssa-phiopt
 # -fstdarg-opt -fstrict-volatile-bitfields -fsync-libcalls
 # -ftoplevel-reorder -ftrapping-math -ftree-bit-ccp
 # -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-coalesce-vars
 # -ftree-copy-prop -ftree-cselim -ftree-dce -ftree-dominator-opts
 # -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert
 # -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize
 # -ftree-parallelize-loops= -ftree-phiprop -ftree-pta -ftree-reassoc
 # -ftree-scev-cprop -ftree-sink -ftree-slsr -ftree-sra -ftree-ter
 # -funit-at-a-time -funwind-tables -fverbose-asm -fzero-initialized-in-bss
 # -m128bit-long-double -m64 -m80387 -maccumulate-outgoing-args
 # -malign-double -malign-stringops -mcx16 -mfancy-math-387 -mfentry
 # -mfp-ret-in-387 -mfxsr -mieee-fp -mlong-double-80 -mmmx -mms-bitfields
 # -mno-sse4 -mpush-args -mred-zone -msse -msse2 -msse3 -mstack-arg-probe
 # -mstackrealign -mvzeroupper

	.text
	.ident	"GCC: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0"

I replaced all -f* flags (except -fgnu-runtime, -ftree-parallelize-loops= because there are no -fno-* versions and -fasynchronous-unwind-tables, -finline-atomics because replacing them with -fno-* versions causes build crash). Unfortunately, the same issue as with single -O1.

@guolinke
Copy link
Collaborator

@StrikerRUS
Is the issue still here?

@StrikerRUS
Copy link
Collaborator Author

@guolinke Yep! And it seems to me that it can be solved only by removing O* flag...
I think that it's not a critical problem which should be in our focus, but who knows, maybe sometime in the future we'll come to the solution.

@StrikerRUS
Copy link
Collaborator Author

@guolinke It seems to me that it's more a bug than a feature request, so reopening this and removing from #2302.

@guolinke
Copy link
Collaborator

guolinke commented Mar 1, 2022

@StrikerRUS is this still happening?

@jameslamb
Copy link
Collaborator

The last time we received a report that might be related was December 2021 (#4192), but that person never responded.

I don't remember seeing it in LightGBM's CI in the last year. I think we could probably close this issue, but will let @StrikerRUS decide.

@StrikerRUS
Copy link
Collaborator Author

@guolinke

is this still happening?

Yeah, still happens occasionally.

@jameslamb
Copy link
Collaborator

I don't recall seeing this in the last year, and the project still has pretty good CI coverage for Windows + MinGW across CLI, Python, and R jobs.

I'm going to close this.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues
including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants