BKM for intra-op vs. inter-op parallelism? #4764
-
Hi, I'm experimenting on an application involving both the so-called intra-op and inter-op parallelism (having a My question is what is the best practice for my application? I suppose by default OpenBLAS is used by calling one function from one thread, while that function will spawn multiple threads and do the compute. In my case, I'm wondering if calling OpenBLAS (multithreaded) GEMM functions from multiple threads will cause the GEMM functions to interfere each other and/or compete for resources? Is there anything I should do during the build? Or should I call any level3 or level2 functions instead of just calling the default Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Yes, sounds like you'd get best performance from using a single-threaded OpenBLAS (build options |
Beta Was this translation helpful? Give feedback.
Yes, sounds like you'd get best performance from using a single-threaded OpenBLAS (build options
USE_THREAD=0 USE_LOCKING=1
) or by callingopenblas_set_num_threads(1)
(or maybe any other small number, if your computer is big or your program is not using many threads at that momemt) before entering a multithreaded code section that makes BLAS calls.