Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setup-python computes wrong cache hash key, restores wrong cache #919

Open
3 of 5 tasks
ryan-williams opened this issue Aug 7, 2024 · 5 comments
Open
3 of 5 tasks
Assignees
Labels
bug Something isn't working

Comments

@ryan-williams
Copy link

ryan-williams commented Aug 7, 2024

Description:

setup-python computed incorrect cache hashes in each of the two jobs in this GHA workflow run (and many previous runs, over at least several days).

In one job, it uses the hash of an earlier version of the requested cache-dependency-path. In the other, it hashes a setup.py file, ignoring the cache-dependency-path input altogether.

In addition to each case being incorrect, they should match one another.

Here is the step yml, from 6ba2932:

- name: Set up Python
  uses: actions/setup-python@v5
  with:
    python-version: ${{ inputs.python_version }}
    cache: pip
    cache-dependency-path: ./.github/workflows/python-ci-single.yml

1. "ubuntu-22.04, 3.11" job

In this job, setup-python hashes an old version of .github/workflows/python-ci-single.yml:

Expected hash:

Expected "Set up Python" hash: a5f953d16191cb533c539c47e385687bf3b5195adf99ce0a6028f563c7781ea3

Actual hash ❌:

Cache restored from key: setup-python-Linux-22.04-Ubuntu-python-3.11.9-pip-5fb22f5c3470d08829ebbe539c68f1f395a8fcf1120bafe12c99af3c1508c9ea

That's the hashFiles of .github/workflows/python-ci-single.yml before the PR being tested (274c6f4; diff), verified by this GHA line:

"Set up Python" should also NOT use hash of old python-ci-single.yml: 5fb22f5c3470d08829ebbe539c68f1f395a8fcf1120bafe12c99af3c1508c9ea

2. "macos-12, 3.11" job

In this job, apis/python/setup.py is hashed, instead of the provided cache-dependency-path (.github/workflows/python-ci-single.yml):

Expected hash:

(same as above):

Expected "Set up Python" hash: a5f953d16191cb533c539c47e385687bf3b5195adf99ce0a6028f563c7781ea3

Actual hash ❌:

Cache restored from key: setup-python-macOS-python-3.11.9-pip-0bc0ed4dc8ce9f9b48ebc42fa10052206992eb7a7705c7a75ec8fcbe15e3e65f

That's hashFiles('apis/python/setup.py') verified by this GHA line:

"Set up Python" should NOT use setup.py hash: 0bc0ed4dc8ce9f9b48ebc42fa10052206992eb7a7705c7a75ec8fcbe15e3e65f

Action version:
v5 (39cd149)

Platform:

  • Ubuntu
  • macOS
  • Windows

Runner type:

  • Hosted
  • Self-hosted
@aparnajyothi-y
Copy link
Contributor

Hello @ryan-williams, Thank you for creating this issue and we will look into it :)

ryan-williams added a commit to single-cell-data/TileDB-SOMA that referenced this issue Aug 8, 2024
See actions/setup-python#919; setup-python cache likely doesn't help much here, as we only `pip install pre-commit`, and cache `~/.cache/pre-commit` separately using `actions/cache`.
ryan-williams added a commit to single-cell-data/TileDB-SOMA that referenced this issue Aug 8, 2024
See actions/setup-python#919; setup-python cache likely doesn't help much here, as we only `pip install pre-commit`, and cache `~/.cache/pre-commit` separately using `actions/cache`.
ryan-williams added a commit to single-cell-data/TileDB-SOMA that referenced this issue Aug 8, 2024
* CI: use `pip` directly (instead of `python -m pip`)

* CI: checkout@v4, setup-python@v5 cache@v4

* disable setup-python cache for lint/pre-commit

See actions/setup-python#919; setup-python cache likely doesn't help much here, as we only `pip install pre-commit`, and cache `~/.cache/pre-commit` separately using `actions/cache`.

* bump to codecov-action@v4

rm coverage/uploader pin added in #2827, per codecov/uploader#1673 (comment)

* `s/MacOS/macOS/

* fix typo from #2854
mojaveazure added a commit to single-cell-data/TileDB-SOMA that referenced this issue Aug 9, 2024
* [python] pin `pandas-stubs>=2` during pre-commit `mypy` hook (#2854)

* pin `pandas-stubs>=2` during pre-commit `mypy` hook

also simplify some `if TYPE_CHECKING` blocks

fixes #2839

* add r-ci.yml exclusions

* `actions{cache,checkout,setup-python}` upgrades (#2856)

* CI: use `pip` directly (instead of `python -m pip`)

* CI: checkout@v4, setup-python@v5 cache@v4

* disable setup-python cache for lint/pre-commit

See actions/setup-python#919; setup-python cache likely doesn't help much here, as we only `pip install pre-commit`, and cache `~/.cache/pre-commit` separately using `actions/cache`.

* bump to codecov-action@v4

rm coverage/uploader pin added in #2827, per codecov/uploader#1673 (comment)

* `s/MacOS/macOS/

* fix typo from #2854

* [python] Offer better guidance on attribute names with `.` (#2864)

* [python] Move `_update_column` into pybind11 (#2862)

* [r] Expose `tiledb_timestamp` parameter to `$reopen()`
Allow `$reopen()` to reopen at a particular timestamp; by default, the
timestamp is set to `NULL` to reopen at the curren time. This is needed
for compatibility between libtiledbsoma's timestamp handling and
resume-mode

[SC-52694](https://app.shortcut.com/tiledb-inc/story/52694/allow-reopen-to-take-a-timestamp-for-reopening)

---------

Co-authored-by: Ryan Williams <[email protected]>
Co-authored-by: John Kerl <[email protected]>
Co-authored-by: nguyenv <[email protected]>
eddelbuettel pushed a commit to single-cell-data/TileDB-SOMA that referenced this issue Aug 9, 2024
* [python] pin `pandas-stubs>=2` during pre-commit `mypy` hook (#2854)

* pin `pandas-stubs>=2` during pre-commit `mypy` hook

also simplify some `if TYPE_CHECKING` blocks

fixes #2839

* add r-ci.yml exclusions

* `actions{cache,checkout,setup-python}` upgrades (#2856)

* CI: use `pip` directly (instead of `python -m pip`)

* CI: checkout@v4, setup-python@v5 cache@v4

* disable setup-python cache for lint/pre-commit

See actions/setup-python#919; setup-python cache likely doesn't help much here, as we only `pip install pre-commit`, and cache `~/.cache/pre-commit` separately using `actions/cache`.

* bump to codecov-action@v4

rm coverage/uploader pin added in #2827, per codecov/uploader#1673 (comment)

* `s/MacOS/macOS/

* fix typo from #2854

* [python] Offer better guidance on attribute names with `.` (#2864)

* [python] Move `_update_column` into pybind11 (#2862)

* [r] Expose `tiledb_timestamp` parameter to `$reopen()`
Allow `$reopen()` to reopen at a particular timestamp; by default, the
timestamp is set to `NULL` to reopen at the curren time. This is needed
for compatibility between libtiledbsoma's timestamp handling and
resume-mode

[SC-52694](https://app.shortcut.com/tiledb-inc/story/52694/allow-reopen-to-take-a-timestamp-for-reopening)

---------

Co-authored-by: Ryan Williams <[email protected]>
Co-authored-by: John Kerl <[email protected]>
Co-authored-by: nguyenv <[email protected]>
@priya-kinthali priya-kinthali self-assigned this Aug 12, 2024
eddelbuettel pushed a commit to single-cell-data/TileDB-SOMA that referenced this issue Aug 19, 2024
* [python] pin `pandas-stubs>=2` during pre-commit `mypy` hook (#2854)

* pin `pandas-stubs>=2` during pre-commit `mypy` hook

also simplify some `if TYPE_CHECKING` blocks

fixes #2839

* add r-ci.yml exclusions

* `actions{cache,checkout,setup-python}` upgrades (#2856)

* CI: use `pip` directly (instead of `python -m pip`)

* CI: checkout@v4, setup-python@v5 cache@v4

* disable setup-python cache for lint/pre-commit

See actions/setup-python#919; setup-python cache likely doesn't help much here, as we only `pip install pre-commit`, and cache `~/.cache/pre-commit` separately using `actions/cache`.

* bump to codecov-action@v4

rm coverage/uploader pin added in #2827, per codecov/uploader#1673 (comment)

* `s/MacOS/macOS/

* fix typo from #2854

* [python] Offer better guidance on attribute names with `.` (#2864)

* [python] Move `_update_column` into pybind11 (#2862)

* [r] Expose `tiledb_timestamp` parameter to `$reopen()`
Allow `$reopen()` to reopen at a particular timestamp; by default, the
timestamp is set to `NULL` to reopen at the curren time. This is needed
for compatibility between libtiledbsoma's timestamp handling and
resume-mode

[SC-52694](https://app.shortcut.com/tiledb-inc/story/52694/allow-reopen-to-take-a-timestamp-for-reopening)

---------

Co-authored-by: Ryan Williams <[email protected]>
Co-authored-by: John Kerl <[email protected]>
Co-authored-by: nguyenv <[email protected]>
@priya-kinthali
Copy link
Contributor

Hello @ryan-williams 👋,
The setup actions use actions/cache under the hood for caching dependencies and the behaviour you're observing is due to how actions/cache caching works.
Please note the following:

  • If all job steps complete successfully and there are changes in the specified dependency file, a new cache key will be created at the post setup job. This new cache will be associated with the key, which includes the hash of the dependency file, and will be used in subsequent runs. In setup step it first tries to find a cache with the exact hash key. If it doesn't find one, it looks for a cache with a key that is a prefix of the specified one and picks the most recent one. For similar discussion please refer to this issue #1433 in the actions/cache repository.

  • If there are no changes in the dependency file, the action will not create a new cache and the existing cache with the same key was used.

  • If any step fails, the job is terminated, and the post setup job step is not executed, meaning a new cache is not created. The next run will again try to restore the most recent cache.

I hope this clarifies the situation. Please let me know if you have any further questions.

Thank you for your patience and understanding!

eddelbuettel pushed a commit to single-cell-data/TileDB-SOMA that referenced this issue Aug 22, 2024
* [python] pin `pandas-stubs>=2` during pre-commit `mypy` hook (#2854)

* pin `pandas-stubs>=2` during pre-commit `mypy` hook

also simplify some `if TYPE_CHECKING` blocks

fixes #2839

* add r-ci.yml exclusions

* `actions{cache,checkout,setup-python}` upgrades (#2856)

* CI: use `pip` directly (instead of `python -m pip`)

* CI: checkout@v4, setup-python@v5 cache@v4

* disable setup-python cache for lint/pre-commit

See actions/setup-python#919; setup-python cache likely doesn't help much here, as we only `pip install pre-commit`, and cache `~/.cache/pre-commit` separately using `actions/cache`.

* bump to codecov-action@v4

rm coverage/uploader pin added in #2827, per codecov/uploader#1673 (comment)

* `s/MacOS/macOS/

* fix typo from #2854

* [python] Offer better guidance on attribute names with `.` (#2864)

* [python] Move `_update_column` into pybind11 (#2862)

* [r] Expose `tiledb_timestamp` parameter to `$reopen()`
Allow `$reopen()` to reopen at a particular timestamp; by default, the
timestamp is set to `NULL` to reopen at the curren time. This is needed
for compatibility between libtiledbsoma's timestamp handling and
resume-mode

[SC-52694](https://app.shortcut.com/tiledb-inc/story/52694/allow-reopen-to-take-a-timestamp-for-reopening)

---------

Co-authored-by: Ryan Williams <[email protected]>
Co-authored-by: John Kerl <[email protected]>
Co-authored-by: nguyenv <[email protected]>
mojaveazure added a commit to single-cell-data/TileDB-SOMA that referenced this issue Aug 26, 2024
* Timestamp range support for dataframe

* Extended support

* (First half) new timestamp test for SOMADataFrame

* Second half of new timestamp passing all but three reopen tests

* [r] Expose `tiledb_timestamp` parameter to `$reopen()` (#2866)

* [python] pin `pandas-stubs>=2` during pre-commit `mypy` hook (#2854)

* pin `pandas-stubs>=2` during pre-commit `mypy` hook

also simplify some `if TYPE_CHECKING` blocks

fixes #2839

* add r-ci.yml exclusions

* `actions{cache,checkout,setup-python}` upgrades (#2856)

* CI: use `pip` directly (instead of `python -m pip`)

* CI: checkout@v4, setup-python@v5 cache@v4

* disable setup-python cache for lint/pre-commit

See actions/setup-python#919; setup-python cache likely doesn't help much here, as we only `pip install pre-commit`, and cache `~/.cache/pre-commit` separately using `actions/cache`.

* bump to codecov-action@v4

rm coverage/uploader pin added in #2827, per codecov/uploader#1673 (comment)

* `s/MacOS/macOS/

* fix typo from #2854

* [python] Offer better guidance on attribute names with `.` (#2864)

* [python] Move `_update_column` into pybind11 (#2862)

* [r] Expose `tiledb_timestamp` parameter to `$reopen()`
Allow `$reopen()` to reopen at a particular timestamp; by default, the
timestamp is set to `NULL` to reopen at the curren time. This is needed
for compatibility between libtiledbsoma's timestamp handling and
resume-mode

[SC-52694](https://app.shortcut.com/tiledb-inc/story/52694/allow-reopen-to-take-a-timestamp-for-reopening)

---------

Co-authored-by: Ryan Williams <[email protected]>
Co-authored-by: John Kerl <[email protected]>
Co-authored-by: nguyenv <[email protected]>

* [r] [NO BACKPORT] Expose timestamps publicly and plumb through for resume-mode (#2871)

Expose timestamps publicly through a new active binding; replace calls to `private$tiledb_timestamp` with `self$tiledb_timestamp`

Also plumb timestamps through for `write_soma()` in resume-mode

* Timestamp range support for dataframe

* (First half) new timestamp test for SOMADataFrame

* Second half of new timestamp passing all but three reopen tests

* Test can now use factor

* Read and write DataFrame and {Dense,Sparse}Array under timestamps

* Quieter warnings

* Adapt one timestamped test

* Pause one test predicate for collections

* clang-format

as obsessing over one whitespace char before a comment is added value

* Micro-fix following code review

More changes to follow in wider changeset

* Update changelog
Bump develop version

---------

Co-authored-by: Paul Hoffman <[email protected]>
Co-authored-by: Ryan Williams <[email protected]>
Co-authored-by: John Kerl <[email protected]>
Co-authored-by: nguyenv <[email protected]>
Co-authored-by: Paul Hoffman <[email protected]>
mojaveazure added a commit to single-cell-data/TileDB-SOMA that referenced this issue Aug 26, 2024
* [python] pin `pandas-stubs>=2` during pre-commit `mypy` hook (#2854)

* pin `pandas-stubs>=2` during pre-commit `mypy` hook

also simplify some `if TYPE_CHECKING` blocks

fixes #2839

* add r-ci.yml exclusions

* `actions{cache,checkout,setup-python}` upgrades (#2856)

* CI: use `pip` directly (instead of `python -m pip`)

* CI: checkout@v4, setup-python@v5 cache@v4

* disable setup-python cache for lint/pre-commit

See actions/setup-python#919; setup-python cache likely doesn't help much here, as we only `pip install pre-commit`, and cache `~/.cache/pre-commit` separately using `actions/cache`.

* bump to codecov-action@v4

rm coverage/uploader pin added in #2827, per codecov/uploader#1673 (comment)

* `s/MacOS/macOS/

* fix typo from #2854

* [python] Offer better guidance on attribute names with `.` (#2864)

* [python] Move `_update_column` into pybind11 (#2862)

* [r] Expose `tiledb_timestamp` parameter to `$reopen()`
Allow `$reopen()` to reopen at a particular timestamp; by default, the
timestamp is set to `NULL` to reopen at the curren time. This is needed
for compatibility between libtiledbsoma's timestamp handling and
resume-mode

[SC-52694](https://app.shortcut.com/tiledb-inc/story/52694/allow-reopen-to-take-a-timestamp-for-reopening)

---------

Co-authored-by: Ryan Williams <[email protected]>
Co-authored-by: John Kerl <[email protected]>
Co-authored-by: nguyenv <[email protected]>
@priya-kinthali
Copy link
Contributor

Hello @ryan-williams 👋,
Just following up on the previous comment. Hope the information provided was helpful.
Thank you!

@ryan-williams
Copy link
Author

I have read your response a few times, but I still don't understand how it can be that I see output like this:

# "Setup Python" step:
Cache restored from key: setup-python-Linux-22.04-Ubuntu-python-3.11.9-pip-5fb22f5c3470d08829ebbe539c68f1f395a8fcf1120bafe12c99af3c1508c9ea
…
# "Check cache hash keys" step:
Expected "Set up Python" hash: a5f953d16191cb533c539c47e385687bf3b5195adf99ce0a6028f563c7781ea3
"Set up Python" should NOT use setup.py hash: 0bc0ed4dc8ce9f9b48ebc42fa10052206992eb7a7705c7a75ec8fcbe15e3e65f
"Set up Python" should also NOT use hash of old python-ci-single.yml: 5fb22f5c3470d08829ebbe539c68f1f395a8fcf1120bafe12c99af3c1508c9ea

The cache key being used is a hash of an old version of the cache-dependency-path input.

In my second example, here, a hash was used of a different file:

# "Setup Python" step:
Cache restored from key: setup-python-macOS-python-3.11.9-pip-0bc0ed4dc8ce9f9b48ebc42fa10052206992eb7a7705c7a75ec8fcbe15e3e65f
…
# "Check cache hash keys" step:
Expected "Set up Python" hash: a5f953d16191cb533c539c47e385687bf3b5195adf99ce0a6028f563c7781ea3
"Set up Python" should NOT use setup.py hash: 0bc0ed4dc8ce9f9b48ebc42fa10052206992eb7a7705c7a75ec8fcbe[15](https://github.com/single-cell-data/TileDB-SOMA/actions/runs/10291856865/job/28485041271?pr=2853#step:4:16)e3e65f
"Set up Python" should also NOT use hash of old python-ci-single.yml: 5fb22f5c3470d08829ebbe539c68f1f395a8fcf1120bafe12c99af3c1508c9ea

Note in both cases that setup-python is being invoked inside a workflow, python-ci-single.yml that was workflow_call'd from the outer python-ci-minimal.yml. Perhaps that is related to why the wrong file or file contents are being hashed.

I have also read actions/cache#1433; the prefix-matching behavior is good to know, but I also don't see how it would explain what I've shown above.

@priya-kinthali
Copy link
Contributor

Hello @ryan-williams 👋,
I have gone through the provided link and here's what might be happening:

  • If a job step fails, the post-setup job step (which updates the cache) is not executed. This means the cache remains
    unchanged and the subsequent runs will use the most recent available cache.
  • If there are no changes in the dependency file, the same cache key will be reused, leading to the same hash being used
    again.


The output you are seeing might be due to a failed step as explained. I attempted to reproduce the issue using the provided link but was unable to do so. Could you please provide a minimal reproducible example outside of your repository? This would help us better understand and investigate the issue further.
Also please note that in general, the cache-dependency-path is designed to be used with dependency files like Pipfile.lock, **/requirements-dev.txt, /setup.cfg, */requirements.txt, or setup.py. For details please refer to the document
Thank you for your patience and understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants