Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch test/tutorials are (likely) using the same model files. #16719

Open
pcanal opened this issue Oct 19, 2024 · 6 comments
Open

PyTorch test/tutorials are (likely) using the same model files. #16719

pcanal opened this issue Oct 19, 2024 · 6 comments
Assignees
Labels

Comments

@pcanal
Copy link
Member

pcanal commented Oct 19, 2024

Doing:

ctest -R tmva -j 32

will result in an arbitrary result (sometimes pass sometime fail) for

gtest-tmva-pymva-TestRModelParserKeras
gtest-tmva-pymva-TestRModelParserPyTorch 

re-running just those tests (whether they succeeded or not) will lead to both of them failing.
The failure report is indicate that they 'now' need the BLAS library (which is not available on the system).

As a possible clue (or not), the following 3 test fails systemically on the system due to the missing BLAS library:

        996 - tutorial-tmva-TMVA_SOFIE_GNN_Application (Failed)
        1000 - tutorial-tmva-TMVA_SOFIE_RDataFrame (Failed)
        1002 - tutorial-tmva-TMVA_SOFIE_RSofieReader (Failed)
@pcanal
Copy link
Member Author

pcanal commented Oct 19, 2024

It is confirms that one of those files:

-rw-r--r--. 1 pcanal us_cms   8962 Oct 19 17:56 ./tmva/pymva/test/PyTorchModelSequential.pt
-rw-r--r--. 1 pcanal us_cms  11913 Oct 19 17:56 ./runtutorials/modelClassification.pt
-rw-r--r--. 1 pcanal us_cms  10564 Oct 19 17:56 ./runtutorials/PyTorchModel.pt
-rw-r--r--. 1 pcanal us_cms  10941 Oct 19 17:56 ./runtutorials/modelMultiClass.pt
-rw-r--r--. 1 pcanal us_cms  11330 Oct 19 17:56 ./runtutorials/trainedModelMultiClass.pt
-rw-r--r--. 1 pcanal us_cms  12110 Oct 19 17:56 ./runtutorials/trainedModelClassification.pt
-rw-r--r--. 1 pcanal us_cms   7853 Oct 19 17:57 ./runtutorials/modelRegression.pt
-rw-r--r--. 1 pcanal us_cms   7972 Oct 19 17:57 ./runtutorials/trainedModelRegression.pt
-rw-r--r--. 1 pcanal us_cms  11044 Oct 19 18:02 ./tmva/pymva/test/PyTorchModelModule.pt
-rw-r--r--. 1 pcanal us_cms   8337 Oct 19 18:02 ./tmva/pymva/test/PyTorchModelConvolution.pt
-rw-r--r--. 1 pcanal us_cms 684930 Oct 19 18:02 ./runtutorials/PyTorchTrainedModelCNN.pt
-rw-r--r--. 1 pcanal us_cms 684658 Oct 19 18:02 ./runtutorials/PyTorchModelCNN.pt

is making gtest-tmva-pymva-TestRModelParserPyTorch fail.

@pcanal
Copy link
Member Author

pcanal commented Oct 19, 2024

However gtest-tmva-pymva-TestRModelParserKeras fails without or without those files.

@pcanal
Copy link
Member Author

pcanal commented Oct 19, 2024

Apparently it is the test itself that is not runnable a second time :(:(

jupyter-pcanal-rootdevel:quick-devel pcanal$ ctest -R gtest-tmva-pymva-TestRModelParserPyTorch
Test project /home/pcanal/root_working/build/quick-devel
    Start 349: gtest-tmva-pymva-TestRModelParserPyTorch
1/1 Test #349: gtest-tmva-pymva-TestRModelParserPyTorch ...   Passed   15.87 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =  16.13 sec
jupyter-pcanal-rootdevel:quick-devel pcanal$ ctest -R gtest-tmva-pymva-TestRModelParserPyTorch
Test project /home/pcanal/root_working/build/quick-devel
    Start 349: gtest-tmva-pymva-TestRModelParserPyTorch
1/1 Test #349: gtest-tmva-pymva-TestRModelParserPyTorch ...***Failed    9.29 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =   9.55 sec

The following tests FAILED:
        349 - gtest-tmva-pymva-TestRModelParserPyTorch (Failed)
Errors while running CTest
Output from these tests are in: /home/pcanal/root_working/build/quick-devel/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

@dpiparo dpiparo assigned lmoneta and unassigned dpiparo Oct 19, 2024
@dpiparo
Copy link
Member

dpiparo commented Oct 19, 2024

Re-assigning to @lmoneta

@dpiparo
Copy link
Member

dpiparo commented Oct 21, 2024

@pcanal , could you please re-summarise the status also given the better understanding we have of #16720 ?

@pcanal
Copy link
Member Author

pcanal commented Oct 21, 2024

The summary is simple (and still the same after applying 38b0d88 (#16722):

On first run in a clean directory with BLAS missing, we get:

ctest -R gtest-tmva-pymva-TestRModelParserPyTorch
Test project /home/pcanal/root_working/build/quick-devel
    Start 349: gtest-tmva-pymva-TestRModelParserPyTorch
1/1 Test #349: gtest-tmva-pymva-TestRModelParserPyTorch ...   Passed   16.11 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =  16.37 sec

and if we immediately re-run we get:

ctest -R gtest-tmva-pymva-TestRModelParserPyTorch
Test project /home/pcanal/root_working/build/quick-devel
    Start 349: gtest-tmva-pymva-TestRModelParserPyTorch
1/1 Test #349: gtest-tmva-pymva-TestRModelParserPyTorch ...***Failed    9.10 sec

and the error is:

[ RUN      ] RModelParser_PyTorch.SEQUENTIAL_MODEL
IncrementalExecutor::executeFunction: symbol 'sgemm_' unresolved while linking [cling interface function]!

indicates that on the 2nd runs, the test want symbols from the BLAS library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants