Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-processing virtual site notebook failing #62

Open
mattwthompson opened this issue Sep 9, 2024 · 2 comments
Open

Pre-processing virtual site notebook failing #62

mattwthompson opened this issue Sep 9, 2024 · 2 comments

Comments

@mattwthompson
Copy link
Member

https://github.com/openforcefield/openff-docs/actions/runs/10755472185

Snippet of log is below. I'm not seeing these failures in my own nightly CI, so perhaps something is configured differently here? (#53 ?)

--------------------------------------------------------------------------------
openforcefield/openff-toolkit/virtual_sites/vsite_showcase.ipynb failed. Traceback:

Traceback (most recent call last):
  File "/home/runner/work/openff-docs/openff-docs/source/_ext/proc_examples.py", line 218, in execute_notebook
    executor.preprocess(nb, {"metadata": {"path": src.parent}})
  File "/home/runner/micromamba/envs/openff-docs-examples/lib/python3.10/site-packages/nbconvert/preprocessors/execute.py", line 103, in preprocess
    self.preprocess_cell(cell, resources, index)
  File "/home/runner/micromamba/envs/openff-docs-examples/lib/python3.10/site-packages/nbconvert/preprocessors/execute.py", line 124, in preprocess_cell
    cell = self.execute_cell(cell, index, store_history=True)
  File "/home/runner/micromamba/envs/openff-docs-examples/lib/python3.10/site-packages/jupyter_core/utils/__init__.py", line 165, in wrapped
    return loop.run_until_complete(inner)
  File "/home/runner/micromamba/envs/openff-docs-examples/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/runner/micromamba/envs/openff-docs-examples/lib/python3.10/site-packages/nbclient/client.py", line [100](https://github.com/openforcefield/openff-docs/actions/runs/10755472185/job/29827200581#step:7:101)5, in async_execute_cell
    exec_reply = await self.task_poll_for_reply
  File "/home/runner/micromamba/envs/openff-docs-examples/lib/python3.10/site-packages/nbclient/client.py", line 806, in _async_poll_for_reply
    error_on_timeout_execute_reply = await self._async_handle_timeout(timeout, cell)
  File "/home/runner/micromamba/envs/openff-docs-examples/lib/python3.10/site-packages/nbclient/client.py", line 856, in _async_handle_timeout
    raise CellTimeoutError.error_from_timeout_and_cell(
nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 1200 seconds.
The message was: Cell execution timed out.
Here is a preview of the cell contents:
-------------------
interchange = force_field.create_interchange(topology=molecule.to_topology())

assert "VirtualSites" in interchange.collections.keys()

n_virtual_sites = len(interchange.collections["VirtualSites"].key_map)

print(f"There are {n_virtual_sites} virtual particles in this topology.")
-------------------


The following 1/29 notebooks failed:
     openforcefield/openff-toolkit/virtual_sites/vsite_showcase.ipynb
For tracebacks, see above.
Writing log to /home/runner/work/openff-docs/openff-docs/notebooks_log.json
Error: Process completed with exit code 1.

P.S. I'm still getting daily email about these runs - could updating the notification flow be looked into? I'm not the best person to figure out if these failures are genuine or if there's a change in the config, automation, etc.

@Yoshanuikabundi
Copy link
Contributor

This seems to be an intermittent error, but I don't know where it's coming from. It probably has something to do with the fact that all the notebooks are run in parallel here, which speeds things up a lot even though the runners don't technically have enough cores. I've experimented with extending the timeout but its already much longer than it should need to be. What if I tried to set it up so that cell timeout errors did not cause the CI to fail?

@mattwthompson
Copy link
Member Author

That seems like an okay band-aid; if the cause is something as silly as getting worse hardware, it would reduce this noise. But if there was a regression introduced in a new release that legitimately makes some common process goofily slow, it would go undetected.

If a few extra CPU cores would be helpful, what about running on better hardware? Either provided (reliable and expensive) by GitHub or by our new tools that hook into AWS?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants