Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] flaky test_ranker test #3817

Closed
StrikerRUS opened this issue Jan 22, 2021 · 23 comments
Closed

[dask] flaky test_ranker test #3817

StrikerRUS opened this issue Jan 22, 2021 · 23 comments

Comments

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jan 22, 2021

Just seen in #3755.

assert 0.8655947285611406 > 0.9

assert spearmanr(rnkvec_dask, rnkvec_local).correlation > 0.9

Full logs:
2021-01-22T16:37:09.2139516Z ============================= test session starts ==============================
2021-01-22T16:37:09.2141377Z platform linux -- Python 3.8.5, pytest-6.2.1, py-1.10.0, pluggy-0.13.1
2021-01-22T16:37:09.2141965Z rootdir: /__w/1/s
2021-01-22T16:37:09.2142341Z collected 248 items
2021-01-22T16:37:09.2142606Z 
2021-01-22T16:37:09.5775745Z ../tests/python_package_test/test_basic.py .............                 [  5%]
2021-01-22T16:37:18.0619061Z ../tests/python_package_test/test_consistency.py ......                  [  7%]
2021-01-22T16:39:33.9324482Z ../tests/python_package_test/test_dask.py .............................. [ 19%]
2021-01-22T16:40:13.7140067Z ..F.......                                                               [ 23%]
2021-01-22T16:40:13.7150510Z ../tests/python_package_test/test_dual.py s                              [ 24%]
2021-01-22T16:40:16.8375077Z ../tests/python_package_test/test_engine.py ............................ [ 35%]
2021-01-22T16:40:58.6316816Z .......................................                                  [ 51%]
2021-01-22T16:40:59.4355458Z ../tests/python_package_test/test_plotting.py .....                      [ 53%]
2021-01-22T16:41:01.1371760Z ../tests/python_package_test/test_sklearn.py ........................... [ 64%]
2021-01-22T16:41:04.0277169Z ......x.........................................x....................... [ 93%]
2021-01-22T16:41:04.2180367Z .................                                                        [100%]
2021-01-22T16:41:04.2181120Z 
2021-01-22T16:41:04.2186375Z =================================== FAILURES ===================================
2021-01-22T16:41:04.2188206Z __________________________ test_ranker[group1-array] ___________________________
2021-01-22T16:41:04.2188660Z 
2021-01-22T16:41:04.2189171Z output = 'array'
2021-01-22T16:41:04.2190004Z client = <Client: 'tcp://127.0.0.1:35043' processes=2 threads=2, memory=16.70 GB>
2021-01-22T16:41:04.2190645Z listen_port = 13320, group = [5, 5, 5, 10, 10, 10, ...]
2021-01-22T16:41:04.2190924Z 
2021-01-22T16:41:04.2191578Z     @pytest.mark.parametrize('output', ['array', 'dataframe'])
2021-01-22T16:41:04.2193455Z     @pytest.mark.parametrize('group', [None, group_sizes])
2021-01-22T16:41:04.2193979Z     def test_ranker(output, client, listen_port, group):
2021-01-22T16:41:04.2194392Z     
2021-01-22T16:41:04.2194721Z         X, y, w, g, dX, dy, dw, dg = _create_ranking_data(
2021-01-22T16:41:04.2195107Z             output=output,
2021-01-22T16:41:04.2195449Z             group=group
2021-01-22T16:41:04.2195714Z         )
2021-01-22T16:41:04.2195963Z     
2021-01-22T16:41:04.2196801Z         # use many trees + leaves to overfit, help ensure that dask data-parallel strategy matches that of
2021-01-22T16:41:04.2197867Z         # serial learner. See https://github.com/microsoft/LightGBM/issues/3292#issuecomment-671288210.
2021-01-22T16:41:04.2198393Z         dask_ranker = dlgbm.DaskLGBMRanker(
2021-01-22T16:41:04.2198720Z             time_out=5,
2021-01-22T16:41:04.2199015Z             local_listen_port=listen_port,
2021-01-22T16:41:04.2200379Z             tree_learner_type='data_parallel',
2021-01-22T16:41:04.2200884Z             n_estimators=50,
2021-01-22T16:41:04.2201209Z             num_leaves=20,
2021-01-22T16:41:04.2201563Z             seed=42,
2021-01-22T16:41:04.2201873Z             min_child_samples=1
2021-01-22T16:41:04.2202208Z         )
2021-01-22T16:41:04.2203012Z         dask_ranker = dask_ranker.fit(dX, dy, sample_weight=dw, group=dg, client=client)
2021-01-22T16:41:04.2204652Z         rnkvec_dask = dask_ranker.predict(dX)
2021-01-22T16:41:04.2205125Z         rnkvec_dask = rnkvec_dask.compute()
2021-01-22T16:41:04.2205413Z     
2021-01-22T16:41:04.2205693Z         local_ranker = lightgbm.LGBMRanker(
2021-01-22T16:41:04.2206033Z             n_estimators=50,
2021-01-22T16:41:04.2206316Z             num_leaves=20,
2021-01-22T16:41:04.2206609Z             seed=42,
2021-01-22T16:41:04.2206882Z             min_child_samples=1
2021-01-22T16:41:04.2207134Z         )
2021-01-22T16:41:04.2207467Z         local_ranker.fit(X, y, sample_weight=w, group=g)
2021-01-22T16:41:04.2207836Z         rnkvec_local = local_ranker.predict(X)
2021-01-22T16:41:04.2208136Z     
2021-01-22T16:41:04.2208460Z         # distributed ranker should be able to rank decently well and should
2021-01-22T16:41:04.2208913Z         # have high rank correlation with scores from serial ranker.
2021-01-22T16:41:04.2210972Z         dcor = spearmanr(rnkvec_dask, y).correlation
2021-01-22T16:41:04.2211848Z         assert dcor > 0.6
2021-01-22T16:41:04.2212392Z >       assert spearmanr(rnkvec_dask, rnkvec_local).correlation > 0.9
2021-01-22T16:41:04.2212927Z E       assert 0.8655947285611406 > 0.9
2021-01-22T16:41:04.2214238Z E        +  where 0.8655947285611406 = SpearmanrResult(correlation=0.8655947285611406, pvalue=6.789654688151309e-63).correlation
2021-01-22T16:41:04.2217512Z E        +    where SpearmanrResult(correlation=0.8655947285611406, pvalue=6.789654688151309e-63) = spearmanr(array([  1.17847793, -10.61257145,   1.10963807,   1.26783458,\n       -10.87116169, -10.30027335,   2.01830419,  -8.42...77889,  -8.53275908, -11.10340087,\n       -10.69837988,  -9.16839805, -10.60553788,  -8.46843972,\n        -8.79032464]), array([ 2.44868591, -2.81931693,  2.56975   ,  2.87658816, -2.58785124,\n       -2.118276  ,  2.67509046, -1.04428093, ...790406,  3.12545573, -0.32474216, -3.00283108,\n       -2.32490923, -1.1799899 , -2.89568167, -1.4975611 , -1.0839513 ]))
2021-01-22T16:41:04.2219365Z 
2021-01-22T16:41:04.2219825Z ../tests/python_package_test/test_dask.py:515: AssertionError
2021-01-22T16:41:04.2220828Z ---------------------------- Captured stderr setup -----------------------------
2021-01-22T16:41:04.2222218Z distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
2021-01-22T16:41:04.2223351Z distributed.scheduler - INFO - Clear task state
2021-01-22T16:41:04.2224318Z distributed.scheduler - INFO -   Scheduler at:     tcp://127.0.0.1:35043
2021-01-22T16:41:04.2225353Z distributed.scheduler - INFO -   dashboard at:            127.0.0.1:8787
2021-01-22T16:41:04.2226430Z distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:36183
2021-01-22T16:41:04.2227502Z distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:36183
2021-01-22T16:41:04.2228602Z distributed.worker - INFO -          dashboard at:            127.0.0.1:43897
2021-01-22T16:41:04.2229704Z distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:35043
2021-01-22T16:41:04.2230734Z distributed.worker - INFO - -------------------------------------------------
2021-01-22T16:41:04.2231721Z distributed.worker - INFO -               Threads:                          1
2021-01-22T16:41:04.2232783Z distributed.worker - INFO -                Memory:                    8.35 GB
2021-01-22T16:41:04.2234162Z distributed.worker - INFO -       Local Directory: /__w/1/s/python-package/_test_worker-04a292a9-9569-4257-a4a4-e5cde3ce8414/dask-worker-space/worker-pvjdw7nw
2021-01-22T16:41:04.2235391Z distributed.worker - INFO - -------------------------------------------------
2021-01-22T16:41:04.2236826Z distributed.scheduler - INFO - Register worker <Worker 'tcp://127.0.0.1:36183', name: tcp://127.0.0.1:36183, memory: 0, processing: 0>
2021-01-22T16:41:04.2238295Z distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:36183
2021-01-22T16:41:04.2240319Z distributed.core - INFO - Starting established connection
2021-01-22T16:41:04.2241475Z distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:35043
2021-01-22T16:41:04.2242487Z distributed.worker - INFO - -------------------------------------------------
2021-01-22T16:41:04.2243352Z distributed.core - INFO - Starting established connection
2021-01-22T16:41:04.2244336Z distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:38443
2021-01-22T16:41:04.2245410Z distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:38443
2021-01-22T16:41:04.2246511Z distributed.worker - INFO -          dashboard at:            127.0.0.1:45633
2021-01-22T16:41:04.2247596Z distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:35043
2021-01-22T16:41:04.2248638Z distributed.worker - INFO - -------------------------------------------------
2021-01-22T16:41:04.2249660Z distributed.worker - INFO -               Threads:                          1
2021-01-22T16:41:04.2250954Z distributed.worker - INFO -                Memory:                    8.35 GB
2021-01-22T16:41:04.2252311Z distributed.worker - INFO -       Local Directory: /__w/1/s/python-package/_test_worker-18375d17-9abd-4d43-90b7-04608036c4f9/dask-worker-space/worker-4m6542hc
2021-01-22T16:41:04.2253528Z distributed.worker - INFO - -------------------------------------------------
2021-01-22T16:41:04.2254939Z distributed.scheduler - INFO - Register worker <Worker 'tcp://127.0.0.1:38443', name: tcp://127.0.0.1:38443, memory: 0, processing: 0>
2021-01-22T16:41:04.2256391Z distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:38443
2021-01-22T16:41:04.2257344Z distributed.core - INFO - Starting established connection
2021-01-22T16:41:04.2258330Z distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:35043
2021-01-22T16:41:04.2259336Z distributed.worker - INFO - -------------------------------------------------
2021-01-22T16:41:04.2260234Z distributed.core - INFO - Starting established connection
2021-01-22T16:41:04.2261294Z distributed.scheduler - INFO - Receive client connection: Client-6edcfe6a-5cd0-11eb-a0a0-0242ac120002
2021-01-22T16:41:04.2262274Z distributed.core - INFO - Starting established connection
2021-01-22T16:41:04.2263209Z ----------------------------- Captured stdout call -----------------------------
2021-01-22T16:41:04.2263793Z [LightGBM] [Warning] Set TCP_NODELAY failed
2021-01-22T16:41:04.2264303Z [LightGBM] [Info] Trying to bind port 13320...
2021-01-22T16:41:04.2264794Z [LightGBM] [Info] Binding port 13320 succeeded
2021-01-22T16:41:04.2265302Z [LightGBM] [Warning] Set TCP_NODELAY failed
2021-01-22T16:41:04.2265748Z [LightGBM] [Info] Listening...
2021-01-22T16:41:04.2266294Z [LightGBM] [Warning] Connecting to rank 1 failed, waiting for 200 milliseconds
2021-01-22T16:41:04.2266868Z [LightGBM] [Warning] Set TCP_NODELAY failed
2021-01-22T16:41:04.2267359Z [LightGBM] [Info] Trying to bind port 13321...
2021-01-22T16:41:04.2267879Z [LightGBM] [Info] Binding port 13321 succeeded
2021-01-22T16:41:04.2268340Z [LightGBM] [Info] Listening...
2021-01-22T16:41:04.2268804Z [LightGBM] [Warning] Set TCP_NODELAY failed
2021-01-22T16:41:04.2269283Z [LightGBM] [Warning] Set TCP_NODELAY failed
2021-01-22T16:41:04.2269780Z [LightGBM] [Warning] Set TCP_NODELAY failed
2021-01-22T16:41:04.2270241Z [LightGBM] [Info] Connected to rank 1
2021-01-22T16:41:04.2270685Z [LightGBM] [Info] Connected to rank 0
2021-01-22T16:41:04.2271252Z [LightGBM] [Info] Local rank: 0, total number of machines: 2
2021-01-22T16:41:04.2271912Z [LightGBM] [Info] Local rank: 1, total number of machines: 2
2021-01-22T16:41:04.2273133Z [LightGBM] [Warning] num_threads is set=1, n_jobs=-1 will be ignored. Current value: num_threads=1
2021-01-22T16:41:04.2274382Z [LightGBM] [Warning] num_threads is set=1, n_jobs=-1 will be ignored. Current value: num_threads=1
2021-01-22T16:41:04.2275484Z --------------------------- Captured stderr teardown ---------------------------
2021-01-22T16:41:04.2276674Z distributed.scheduler - INFO - Remove client Client-6edcfe6a-5cd0-11eb-a0a0-0242ac120002
2021-01-22T16:41:04.2277752Z distributed.scheduler - INFO - Remove client Client-6edcfe6a-5cd0-11eb-a0a0-0242ac120002
2021-01-22T16:41:04.2278885Z distributed.scheduler - INFO - Close client connection: Client-6edcfe6a-5cd0-11eb-a0a0-0242ac120002
2021-01-22T16:41:04.2280218Z distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:36183
2021-01-22T16:41:04.2281268Z distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:38443
2021-01-22T16:41:04.2282740Z distributed.scheduler - INFO - Remove worker <Worker 'tcp://127.0.0.1:36183', name: tcp://127.0.0.1:36183, memory: 0, processing: 0>
2021-01-22T16:41:04.2284169Z distributed.core - INFO - Removing comms to tcp://127.0.0.1:36183
2021-01-22T16:41:04.2285576Z distributed.scheduler - INFO - Remove worker <Worker 'tcp://127.0.0.1:38443', name: tcp://127.0.0.1:38443, memory: 0, processing: 0>
2021-01-22T16:41:04.2287070Z distributed.core - INFO - Removing comms to tcp://127.0.0.1:38443
2021-01-22T16:41:04.2288137Z distributed.scheduler - INFO - Lost all workers
2021-01-22T16:41:04.2288918Z distributed.scheduler - INFO - Scheduler closing...
2021-01-22T16:41:04.2289730Z distributed.scheduler - INFO - Scheduler closing all comms
2021-01-22T16:41:04.2290375Z =============================== warnings summary ===============================
2021-01-22T16:41:04.2290972Z tests/python_package_test/test_basic.py::test_basic
2021-01-22T16:41:04.2291555Z tests/python_package_test/test_engine.py::test_reference_chain
2021-01-22T16:41:04.2292155Z tests/python_package_test/test_engine.py::test_init_with_subset
2021-01-22T16:41:04.2292793Z tests/python_package_test/test_engine.py::test_fpreproc
2021-01-22T16:41:04.2293418Z tests/python_package_test/test_engine.py::test_dataset_params_with_reference
2021-01-22T16:41:04.2294771Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:1317: UserWarning: Overriding the parameters from Reference Dataset.
2021-01-22T16:41:04.2295935Z     warnings.warn('Overriding the parameters from Reference Dataset.')
2021-01-22T16:41:04.2296366Z 
2021-01-22T16:41:04.2296829Z tests/python_package_test/test_basic.py::test_save_and_load_linear
2021-01-22T16:41:04.2297441Z tests/python_package_test/test_engine.py::test_pandas_categorical
2021-01-22T16:41:04.2298055Z tests/python_package_test/test_engine.py::test_linear_trees
2021-01-22T16:41:04.2299276Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:1589: UserWarning: categorical_feature in Dataset is overridden.
2021-01-22T16:41:04.2300040Z   New categorical_feature is [0]
2021-01-22T16:41:04.2300846Z     warnings.warn('categorical_feature in Dataset is overridden.\n'
2021-01-22T16:41:04.2301253Z 
2021-01-22T16:41:04.2301794Z tests/python_package_test/test_basic.py::test_add_features_equal_data_on_alternating_used_unused
2021-01-22T16:41:04.2302526Z tests/python_package_test/test_basic.py::test_add_features_same_booster_behaviour
2021-01-22T16:41:04.2303189Z tests/python_package_test/test_engine.py::test_sliced_data
2021-01-22T16:41:04.2303789Z tests/python_package_test/test_engine.py::test_monotone_penalty_max
2021-01-22T16:41:04.2304411Z tests/python_package_test/test_engine.py::test_forced_bins
2021-01-22T16:41:04.2305925Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:332: UserWarning: Usage of np.ndarray subset (sliced data) is not recommended due to it will double the peak memory cost in LightGBM.
2021-01-22T16:41:04.2307004Z     warnings.warn("Usage of np.ndarray subset (sliced data) is not recommended "
2021-01-22T16:41:04.2307449Z 
2021-01-22T16:41:04.2307977Z tests/python_package_test/test_basic.py::test_add_features_equal_data_on_alternating_used_unused
2021-01-22T16:41:04.2308698Z tests/python_package_test/test_basic.py::test_add_features_same_booster_behaviour
2021-01-22T16:41:04.2309402Z tests/python_package_test/test_basic.py::test_add_features_from_different_sources
2021-01-22T16:41:04.2310773Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:2014: UserWarning: Cannot add features from NoneType type of raw data to NoneType type of raw data.
2021-01-22T16:41:04.2311461Z   Set free_raw_data=False when construct Dataset to avoid this
2021-01-22T16:41:04.2311852Z     warnings.warn(err_msg)
2021-01-22T16:41:04.2312091Z 
2021-01-22T16:41:04.2312474Z tests/python_package_test/test_basic.py::test_add_features_equal_data_on_alternating_used_unused
2021-01-22T16:41:04.2312996Z tests/python_package_test/test_basic.py::test_add_features_same_booster_behaviour
2021-01-22T16:41:04.2313542Z tests/python_package_test/test_basic.py::test_add_features_from_different_sources
2021-01-22T16:41:04.2314568Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:2016: UserWarning: Reseting categorical features.
2021-01-22T16:41:04.2315215Z   You can set new categorical features via ``set_categorical_feature`` method
2021-01-22T16:41:04.2315642Z     warnings.warn("Reseting categorical features.\n"
2021-01-22T16:41:04.2316088Z 
2021-01-22T16:41:04.2316447Z tests/python_package_test/test_basic.py::test_add_features_from_different_sources
2021-01-22T16:41:04.2317566Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:2014: UserWarning: Cannot add features from list type of raw data to ndarray type of raw data.
2021-01-22T16:41:04.2318197Z   Freeing raw data
2021-01-22T16:41:04.2318474Z     warnings.warn(err_msg)
2021-01-22T16:41:04.2318727Z 
2021-01-22T16:41:04.2319079Z tests/python_package_test/test_basic.py::test_add_features_from_different_sources
2021-01-22T16:41:04.2320389Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:2014: UserWarning: Cannot add features from list type of raw data to csr_matrix type of raw data.
2021-01-22T16:41:04.2321044Z   Freeing raw data
2021-01-22T16:41:04.2321372Z     warnings.warn(err_msg)
2021-01-22T16:41:04.2321603Z 
2021-01-22T16:41:04.2321968Z tests/python_package_test/test_basic.py::test_add_features_from_different_sources
2021-01-22T16:41:04.2323084Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:2014: UserWarning: Cannot add features from list type of raw data to DataFrame type of raw data.
2021-01-22T16:41:04.2323686Z   Freeing raw data
2021-01-22T16:41:04.2324007Z     warnings.warn(err_msg)
2021-01-22T16:41:04.2324248Z 
2021-01-22T16:41:04.2324577Z tests/python_package_test/test_consistency.py: 10 warnings
2021-01-22T16:41:04.2325625Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/engine.py:149: UserWarning: Found `num_trees` in params. Will use it instead of argument
2021-01-22T16:41:04.2326311Z     warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))
2021-01-22T16:41:04.2326688Z 
2021-01-22T16:41:04.2327019Z tests/python_package_test/test_consistency.py: 10 warnings
2021-01-22T16:41:04.2328048Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:1106: UserWarning: data keyword has been found in `params` and will be ignored.
2021-01-22T16:41:04.2328736Z   Please use data argument of the Dataset constructor to pass this parameter.
2021-01-22T16:41:04.2329518Z     warnings.warn('{0} keyword has been found in `params` and will be ignored.\n'
2021-01-22T16:41:04.2329879Z 
2021-01-22T16:41:04.2330279Z tests/python_package_test/test_dask.py::test_errors
2021-01-22T16:41:04.2331198Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/dask.py:229: RuntimeWarning: coroutine '_wait' was never awaited
2021-01-22T16:41:04.2331758Z     wait(parts)
2021-01-22T16:41:04.2331984Z 
2021-01-22T16:41:04.2332301Z tests/python_package_test/test_engine.py::test_binary
2021-01-22T16:41:04.2333320Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/engine.py:149: UserWarning: Found `num_iteration` in params. Will use it instead of argument
2021-01-22T16:41:04.2334031Z     warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))
2021-01-22T16:41:04.2334583Z 
2021-01-22T16:41:04.2334925Z tests/python_package_test/test_engine.py::test_pandas_categorical
2021-01-22T16:41:04.2335988Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:1589: UserWarning: categorical_feature in Dataset is overridden.
2021-01-22T16:41:04.2336789Z   New categorical_feature is ['A']
2021-01-22T16:41:04.2337436Z     warnings.warn('categorical_feature in Dataset is overridden.\n'
2021-01-22T16:41:04.2337789Z 
2021-01-22T16:41:04.2338137Z tests/python_package_test/test_engine.py::test_pandas_categorical
2021-01-22T16:41:04.2339115Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:1589: UserWarning: categorical_feature in Dataset is overridden.
2021-01-22T16:41:04.2339971Z   New categorical_feature is ['A', 'B', 'C', 'D']
2021-01-22T16:41:04.2340652Z     warnings.warn('categorical_feature in Dataset is overridden.\n'
2021-01-22T16:41:04.2340989Z 
2021-01-22T16:41:04.2341353Z tests/python_package_test/test_engine.py::test_pandas_categorical
2021-01-22T16:41:04.2342534Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:1589: UserWarning: categorical_feature in Dataset is overridden.
2021-01-22T16:41:04.2343382Z   New categorical_feature is ['A', 'B', 'C', 'D', 'E']
2021-01-22T16:41:04.2344084Z     warnings.warn('categorical_feature in Dataset is overridden.\n'
2021-01-22T16:41:04.2344412Z 
2021-01-22T16:41:04.2344771Z tests/python_package_test/test_engine.py::test_pandas_categorical
2021-01-22T16:41:04.2345714Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:1589: UserWarning: categorical_feature in Dataset is overridden.
2021-01-22T16:41:04.2346299Z   New categorical_feature is []
2021-01-22T16:41:04.2346928Z     warnings.warn('categorical_feature in Dataset is overridden.\n'
2021-01-22T16:41:04.2347244Z 
2021-01-22T16:41:04.2347595Z tests/python_package_test/test_engine.py::test_pandas_sparse
2021-01-22T16:41:04.2348083Z tests/python_package_test/test_sklearn.py::TestSklearn::test_pandas_sparse
2021-01-22T16:41:04.2349387Z   /home/AzDevOps_azpcontainer/miniconda/envs/test-env/lib/python3.8/site-packages/pandas/core/generic.py:5670: PerformanceWarning: Concatenating sparse arrays with multiple fill values: '[0, nan, False]'. Picking the first and converting the rest.
2021-01-22T16:41:04.2350261Z     return self._mgr.as_array(transpose=self._AXIS_REVERSED)
2021-01-22T16:41:04.2350595Z 
2021-01-22T16:41:04.2350940Z tests/python_package_test/test_engine.py::test_int32_max_sparse_contribs
2021-01-22T16:41:04.2352210Z   /home/AzDevOps_azpcontainer/miniconda/envs/test-env/lib/python3.8/site-packages/scipy/sparse/_index.py:82: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
2021-01-22T16:41:04.2352952Z     self._set_intXint(row, col, x.flat[0])
2021-01-22T16:41:04.2353240Z 
2021-01-22T16:41:04.2353581Z tests/python_package_test/test_engine.py::test_init_with_subset
2021-01-22T16:41:04.2354571Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:1843: UserWarning: Cannot subset str type of raw data.
2021-01-22T16:41:04.2355152Z   Returning original raw data
2021-01-22T16:41:04.2355514Z     warnings.warn("Cannot subset {} type of raw data.\n"
2021-01-22T16:41:04.2355826Z 
2021-01-22T16:41:04.2356167Z tests/python_package_test/test_engine.py::test_monotone_constraints
2021-01-22T16:41:04.2356648Z tests/python_package_test/test_engine.py::test_monotone_penalty
2021-01-22T16:41:04.2357109Z tests/python_package_test/test_engine.py::test_monotone_penalty_max
2021-01-22T16:41:04.2357578Z tests/python_package_test/test_engine.py::test_get_split_value_histogram
2021-01-22T16:41:04.2358102Z tests/python_package_test/test_sklearn.py::TestSklearn::test_pandas_categorical
2021-01-22T16:41:04.2359754Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:1586: UserWarning: Using categorical_feature in Dataset.
2021-01-22T16:41:04.2360927Z     warnings.warn('Using categorical_feature in Dataset.')
2021-01-22T16:41:04.2361252Z 
2021-01-22T16:41:04.2361622Z tests/python_package_test/test_engine.py::test_dataset_update_params
2021-01-22T16:41:04.2362664Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:1106: UserWarning: categorical_feature keyword has been found in `params` and will be ignored.
2021-01-22T16:41:04.2363413Z   Please use categorical_feature argument of the Dataset constructor to pass this parameter.
2021-01-22T16:41:04.2364201Z     warnings.warn('{0} keyword has been found in `params` and will be ignored.\n'
2021-01-22T16:41:04.2364544Z 
2021-01-22T16:41:04.2364933Z tests/python_package_test/test_plotting.py::TestBasic::test_plot_metrics
2021-01-22T16:41:04.2365525Z   /__w/1/s/tests/python_package_test/test_plotting.py:156: UserWarning: More than one metric available, picking one to plot.
2021-01-22T16:41:04.2366072Z     ax0 = lgb.plot_metric(evals_result0)
2021-01-22T16:41:04.2366503Z 
2021-01-22T16:41:04.2366904Z tests/python_package_test/test_plotting.py::TestBasic::test_plot_split_value_histogram
2021-01-22T16:41:04.2368104Z   /home/AzDevOps_azpcontainer/miniconda/envs/test-env/lib/python3.8/site-packages/cycler.py:374: RuntimeWarning: coroutine 'PooledRPCCall.__getattr__.<locals>.send_recv_from_rpc' was never awaited
2021-01-22T16:41:04.2368778Z     for k in keys:
2021-01-22T16:41:04.2369011Z 
2021-01-22T16:41:04.2369420Z tests/python_package_test/test_sklearn.py::TestSklearn::test_binary_classification_with_custom_objective
2021-01-22T16:41:04.2370637Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/sklearn.py:934: UserWarning: Cannot compute class probabilities or labels due to the usage of customized objective function.
2021-01-22T16:41:04.2371302Z   Returning raw scores instead.
2021-01-22T16:41:04.2371704Z     warnings.warn("Cannot compute class probabilities or labels "
2021-01-22T16:41:04.2372011Z 
2021-01-22T16:41:04.2372401Z tests/python_package_test/test_sklearn.py::TestSklearn::test_stacking_classifier
2021-01-22T16:41:04.2373076Z   <frozen importlib._bootstrap>:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
2021-01-22T16:41:04.2373592Z 
2021-01-22T16:41:04.2373930Z tests/python_package_test/test_sklearn.py: 12 warnings
2021-01-22T16:41:04.2374918Z   /home/AzDevOps_azpcontainer/.local/lib/python3.8/site-packages/lightgbm/basic.py:623: UserWarning: Converting data to scipy sparse matrix.
2021-01-22T16:41:04.2375803Z     warnings.warn('Converting data to scipy sparse matrix.')
2021-01-22T16:41:04.2376126Z 
2021-01-22T16:41:04.2376679Z -- Docs: https://docs.pytest.org/en/stable/warnings.html
2021-01-22T16:41:04.2377178Z =========================== short test summary info ============================
2021-01-22T16:41:04.2377915Z FAILED ../tests/python_package_test/test_dask.py::test_ranker[group1-array]
2021-01-22T16:41:04.2378487Z = 1 failed, 244 passed, 1 skipped, 2 xfailed, 74 warnings in 236.41s (0:03:56) =
2021-01-22T16:41:04.4494486Z ##[error]Bash exited with code '255'.
@StrikerRUS
Copy link
Collaborator Author

cc @jameslamb @ffineis

@StrikerRUS
Copy link
Collaborator Author

Just wonder: is there any seed-like parameter in Dask we can set and ensure the same results across all test runs?

@ffineis
Copy link
Contributor

ffineis commented Jan 22, 2021

OY. Ok this should be decreased. We've set seed on everything, including in _make_ranking - the issue I think is how dask chooses to partition the data across workers. Sometimes it splits rows (and therefore, groups) depending on the test: 30/70, 50/50, 120/30, etc. When the split is really uneven, we get worse models.

This might be kind of outrageous, but wondering if maybe we should support a constructor argument broadcast that sends all data to all workers?

futures = client.scatter([1, 2, 3], broadcast=True)  # send data to all workers

This would ensure split points are the same in each worker, would stabilize testing.

@ffineis
Copy link
Contributor

ffineis commented Jan 22, 2021

I'm submitting a PR to lower the threshold

@StrikerRUS
Copy link
Collaborator Author

the issue I think is how dask chooses to partition the data across workers. Sometimes it splits rows (and therefore, groups) depending on the test: 30/70, 50/50, 120/30, etc.

Can we somehow remove randomness from Dask data partitioning?

@ffineis
Copy link
Contributor

ffineis commented Jan 22, 2021

Line 226 in dask.py:

    # Start computation in the background
    parts = list(map(delayed, parts))
    if 'broadcast' in kwargs:
        parts = client.scatter(parts, broadcast = True)
    parts = client.compute(parts)
    wait(parts)

might do the trick. There are probably other solutions too. I'll try this out over the weekend - in the meantime, PR is out to lower threshold.

@StrikerRUS
Copy link
Collaborator Author

Oh thanks a lot for the info! Glad that there are possible workarounds.

@jameslamb jameslamb changed the title Flaky Dask test_ranker test [dask] flaky test_ranker test Jan 22, 2021
@jameslamb
Copy link
Collaborator

jameslamb commented Jan 22, 2021

This might be kind of outrageous, but wondering if maybe we should support a constructor argument broadcast that sends all data to all workers?
This would ensure split points are the same in each worker, would stabilize testing.

one of the benefits of data-parallel learning in LightGBM is that you don't have to send all data to all workers. So you can have a training dataset that's larger than what any one machine involved can hold by itself. So I'm not comfortable with only testing the scenario where all workers have to have all of the training data.

I can see that this failure is from the test where the chunk sizes are not all the same ( group = [5, 5, 5, 10, 10, 10, ...]). That's the thing that is opening up the possibility of nondeterministic behavior. For other tests, we have equal chunk sizes and so Dask usually ends up with a roughly even split.

I think we might be able to solve this for testing by two changes:

  1. explicitly persist() + wait() on input data so that it gets materialized on the workers before training
  2. use client.rebalance() to try to get a more even spread.

I think those two changes could make this specific test more reliable, and that those changes (which do introduce some time cost and could make the tests different from how users would normally work with lightgbm) only need to be applied in cases where the chunk sizes are different. For now, that only means the ranker tests.

What do you think @ffineis ?

@ffineis
Copy link
Contributor

ffineis commented Jan 22, 2021

Yeah for sure, I hadn't even heard of client.rebalance - that sounds great. So basically we ask the workers to compute their input data chunks even prior to DaskLGBMRanker's call to client.submit, it's kind of like calling MPI's barrier prior to model training. You have an example similar to this here. Yeah I'll try it out.

@jameslamb
Copy link
Collaborator

I want to note, I just did 300 test runs while investigating another issue (#3829 (comment)). This ranker test did not fail in any of them.

@ffineis
Copy link
Contributor

ffineis commented Jan 28, 2021

Nice! That's good. I do have updates on this I've been meaning to push. Will get to them tonight, tomorrow night latest. Ran some experiments using the client.rebalance method you mentioned. Seems to works for arrays, but dask dataframes are already well-balanced when the index is set. Will post more details.

@ffineis
Copy link
Contributor

ffineis commented Jan 31, 2021

Hey, sorry for the delay re: client.rebalancing the test_ranker data. I ran some experiments:

Running 100 tests with rebalancing and 100 without, on each test I saved each worker's data.shape along with the corresponding spearman rank correlation (this was the statistic affecting the flaky assert statement regarding the quality of the dask ranker). Since the client fixture we're using sets up 2 workers on each test, the for each test we can take the maximum ratio workers[i].shape[0] / (workers[0].shape[0] + workers[1].shape[0]) for i = 0, 1 to illustrate how uneven the data distribution among the workers is. For example, if one worker gets 90 rows and the other gets 115, the ratio becomes 115 / 205 = 0.56. A perfect 50/50 data allocation produces a ratio of 0.5. When using client.rebalance the ratios across the tests should both

  • converge to a few distinct ratios, as there is less randomness in how data is distributed to workers
  • converge to values close to 0.5.

Here are the worker ratio distributions for the fixed group-size tests, the pre-specified group size tests, rebalanced and non-rebalanced tests:

worker_data_ratios

So the rebalancing really does help out in the case of the dask.array input. In the fixed group size = 10 tests, the ratios are perfectly reset to an even 50/50 split (top-right plot), and in the variable group size tests ("group = [...]") the ratio is most often re-allocated to 90 / 115 rows each, which is pretty even (bottom-right). Surprisingly, when we use client.rebalance on the dask.DataFrame input though, it actually really messes up the worker ratios. In the fixed group size = 10 tests, we go from perfect 50/50 splits to rebalancing 100% of the data to one worker (which actually results in spearman correlation = 1, but this is undesirable - top-left). In the variable group size tests, the worker ratio is consistently 90 / 115 without rebalancing but asking Dask to rebalance shifts the worker-data distribution consistently to 50 / 155 (or 0.756) in the bottom-left plot. I'm definitely thinking that when we call dd.from_pandas(X_df, chunksize=50) in _create_ranking_data, because X_df already has its index set, that dask does a good job of distributing data evenly to workers with a specific chunksize in mind. When we ask it to rebalance the data using chunksize = 50, it's reshuffling an already optimal allocation to give one worker 50 rows.

The same is not true for the array input, in which case groups are randomly allocated to workers (data has not been distributed with a set chunksize in mind. In the case of the dask.array input, client.rebalance really helps make the distribution of data to workers more equal.

Here are the resulting distributions of spearman rank correlation coefficients:

spearmanr_dists

As we can see, the poor redistribution of data on the DataFrame input causes smaller spearmanr correlation, but redistribution helps shift the correlation to the right for array input, because the worker allocations are much more even.

Anyway! This is all to say that we should leave the data distribution alone in the case of DataFrame input (because Dask already uses the set index to allocate data evenly) but we should use client.rebalance in the array input case to help make the tests more stable. Note - the data rebalancing is still random, just a lot less so than randomly sending groups to workers.

I'd like to submit a a PR adding the following to test_ranker if it's alright -

# more evenly distribute data to workers helps stabilize tests
if output == 'array':

    dX = dX.persist()
    dy = dy.persist()
    dw = dw.persist()
    dg = dg.persist()
    _ = wait([dX, dy, dw, dg])
    client.rebalance()

@jameslamb
Copy link
Collaborator

Thanks for the thorough research and explanation! It makes sense to me that the problem isn't as severe with the dataframe case because we put group information into the index and Dask can use that to cheaply form a plan to distribute the data.

I'm ok with your proposal to use .rebalance() only in the array case.

I think that we should expect these differences in the model produced wouldn't be as severe with real-world data sizes, and that the outputs just have higher variance because we're trying to use very small amounts of data and very few training iterations to keep the tests fast. So I'm not worried that this stuff is something that LightGBM's users will necessarily have to think a lot about.

@jameslamb
Copy link
Collaborator

fixed by #3892, will re-open if I see that error again

@StrikerRUS
Copy link
Collaborator Author

            # distributed ranker should be able to rank decently well and should
            # have high rank correlation with scores from serial ranker.
            dcor = spearmanr(rnkvec_dask, y).correlation
            assert dcor > 0.6
>           assert spearmanr(rnkvec_dask, rnkvec_local).correlation > 0.8
E           assert 0.6920503168844578 > 0.8
E            +  where 0.6920503168844578 = SpearmanrResult(correlation=0.6920503168844578, pvalue=1.9556072463418407e-143).correlation
E            +    where SpearmanrResult(correlation=0.6920503168844578, pvalue=1.9556072463418407e-143) = spearmanr(array([ 3.00160839e+00, -3.80369297e+00,  3.00371045e+00,  3.00553804e+00,\n       -3.80299662e+00, -3.77897794e+00,  3...5779e+00, -8.57866491e-01, -2.81553536e-01,\n       -4.74791390e-01,  3.04781198e+00,  3.04424878e+00, -3.84800463e+00]), array([ 2.79518432, -2.92510827,  2.79945042,  2.79739195, -2.93465458,\n       -2.94558246,  2.79945042, -0.76099679, ...540771, -0.90650662,  2.79634894, -1.03025436,\n       -0.83118181, -1.19990266,  2.79697959,  2.79706979, -2.9324251 ]))

../tests/python_package_test/test_dask.py:695: AssertionError


FAILED ../tests/python_package_test/test_dask.py::test_ranker[data-dart-None-array]

https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=9750&view=logs&j=275189f9-c769-596a-7ef9-49fb48a9ab70&t=3a9e7a4a-04e6-52a0-67ea-6e8f6cfda74f

@StrikerRUS
Copy link
Collaborator Author

One more case with almost the same value but for another job and Python version: https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=9759&view=logs&j=02a2c3ba-81f8-54e3-0767-5d5adbb0daa9&t=720ee3fa-96d4-5b47-dbf4-01607b74ade2.


            # distributed ranker should be able to rank decently well and should
            # have high rank correlation with scores from serial ranker.
            dcor = spearmanr(rnkvec_dask, y).correlation
            assert dcor > 0.6
>           assert spearmanr(rnkvec_dask, rnkvec_local).correlation > 0.8
E           assert 0.6920577389482582 > 0.8
E            +  where 0.6920577389482582 = SpearmanrResult(correlation=0.6920577389482582, pvalue=1.9364416442404153e-143).correlation
E            +    where SpearmanrResult(correlation=0.6920577389482582, pvalue=1.9364416442404153e-143) = spearmanr(array([ 3.00160838e+00, -3.80369368e+00,  3.00371043e+00,  3.00553805e+00,\n       -3.80299742e+00, -3.77897874e+00,  3...5779e+00, -8.57867239e-01, -2.81553615e-01,\n       -4.74791153e-01,  3.04781197e+00,  3.04424796e+00, -3.84800553e+00]), array([ 2.79518432, -2.92510827,  2.79945042,  2.79739195, -2.93465458,\n       -2.94558246,  2.79945042, -0.76099679, ...540771, -0.90650662,  2.79634894, -1.03025436,\n       -0.83118181, -1.19990266,  2.79697959,  2.79706979, -2.9324251 ]))

../tests/python_package_test/test_dask.py:695: AssertionError


FAILED ../tests/python_package_test/test_dask.py::test_ranker[data-dart-None-array]

@ffineis
Copy link
Contributor

ffineis commented Apr 11, 2021

One more case with almost the same value but for another job and Python version: https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=9759&view=logs&j=02a2c3ba-81f8-54e3-0767-5d5adbb0daa9&t=720ee3fa-96d4-5b47-dbf4-01607b74ade2.


            # distributed ranker should be able to rank decently well and should
            # have high rank correlation with scores from serial ranker.
            dcor = spearmanr(rnkvec_dask, y).correlation
            assert dcor > 0.6
>           assert spearmanr(rnkvec_dask, rnkvec_local).correlation > 0.8
E           assert 0.6920577389482582 > 0.8
E            +  where 0.6920577389482582 = SpearmanrResult(correlation=0.6920577389482582, pvalue=1.9364416442404153e-143).correlation
E            +    where SpearmanrResult(correlation=0.6920577389482582, pvalue=1.9364416442404153e-143) = spearmanr(array([ 3.00160838e+00, -3.80369368e+00,  3.00371043e+00,  3.00553805e+00,\n       -3.80299742e+00, -3.77897874e+00,  3...5779e+00, -8.57867239e-01, -2.81553615e-01,\n       -4.74791153e-01,  3.04781197e+00,  3.04424796e+00, -3.84800553e+00]), array([ 2.79518432, -2.92510827,  2.79945042,  2.79739195, -2.93465458,\n       -2.94558246,  2.79945042, -0.76099679, ...540771, -0.90650662,  2.79634894, -1.03025436,\n       -0.83118181, -1.19990266,  2.79697959,  2.79706979, -2.9324251 ]))

../tests/python_package_test/test_dask.py:695: AssertionError


FAILED ../tests/python_package_test/test_dask.py::test_ranker[data-dart-None-array]

Alright, I'lll look into this and try to patch. Curious myself what seems to have brought this back to life after 2 months of peace! Might also consider changing the quality of ranking from spearman correlation to NDCG@k, perhaps that statistic is less variable.

@jameslamb
Copy link
Collaborator

@ffineis it isn't EXACTLY the same isue. Now it's a different type of boosting (dart), which we now test for as of #4119 . I expect that @jmoralez 's work in #4149 is going yo resolve this, so you might want to look that PR to avoid spending time on things @jmoralez has already tried.

@ffineis
Copy link
Contributor

ffineis commented Apr 11, 2021

@jameslamb ah ok awesome, thanks for the heads up. Yeah was going to suggest this is probably related to the new testing parameters. Makes sense that a regularization technique could start causing performance drops in our small test set sizes when these tests are basically just testing to see whether models can overfit to the training data. I still think moving to NDCG as a rank quality measure might be more appropriate, so I'll just test it out locally and see if the variance is in fact lower than spearman correlation. Thanks for the heads up though.

@jameslamb
Copy link
Collaborator

Just noting, since we're keeping track of it, that I just saw this again on #4188

>           assert spearmanr(rnkvec_dask, rnkvec_local).correlation > 0.8
E           assert 0.6920478448632085 > 0.8
E            +  where 0.6920478448632085 = SpearmanrResult(correlation=0.6920478448632085, pvalue=1.962032503537805e-143).correlation
E            +    where SpearmanrResult(correlation=0.6920478448632085, pvalue=1.962032503537805e-143) = spearmanr(array([ 3.00160672e+00, -3.80369100e+00,  3.00370874e+00,  3.00554481e+00,\n       -3.80299463e+00, -3.77897599e+00,  3...5609e+00, -8.57866442e-01, -2.81555707e-01,\n       -4.74793220e-01,  3.04781027e+00,  3.04426396e+00, -3.84800265e+00]), array([ 2.79518432, -2.92510827,  2.79945042,  2.79739195, -2.93465458,\n       -2.94558246,  2.79945042, -0.76099679, ...540771, -0.90650662,  2.79634894, -1.03025436,\n       -0.83118181, -1.19990266,  2.79697959,  2.79706979, -2.9324251 ]))

../tests/python_package_test/test_dask.py:695: AssertionError

...

=========================== short test summary info ============================
FAILED ../tests/python_package_test/test_dask.py::test_ranker[data-dart-None-array]
= 1 failed, 510 passed, 6 skipped, 2 xfailed, 339 warnings in 248.60s (0:04:08) =

@StrikerRUS
Copy link
Collaborator Author

No reports since April 17. I guess we can close this. WDYT, guys?

@jameslamb
Copy link
Collaborator

Agree! I haven't seen this since then. Let's close it.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants