Skip to content
This repository has been archived by the owner on Nov 7, 2024. It is now read-only.

test flake error: opening repo: opendir(/proc/self/fd/3): No such file or directory #664

Closed
cgwalters opened this issue Sep 19, 2024 · 5 comments · Fixed by coreos/cap-std-ext#59

Comments

@cgwalters
Copy link
Member

Caused by:
    0: Parsing layer blob 6d52d9af072841028a446e6a01f4a89484ae4ed93f235cfbccd22a877231736e
    1: error: opening repo: opendir(/proc/self/fd/3): No such file or directory
    2: Processing tar
    3: Failed to commit tar: ExitStatus(unix_wait_status(256))

We see this pretty often, it's a bit concerning as I have no idea how it could just be flaky.

cgwalters added a commit to cgwalters/ostree-rs-ext that referenced this issue Oct 31, 2024
Might help with ostreedev#664
But I didn't manage to reliably reproduce this.
@cgwalters
Copy link
Member Author

Put up #676

@jeckersb
Copy link
Contributor

I was able to get this to reproduce under perf in test_tar_write_tar_layer. Since we always use fd=3 it was straightforward to just grab everywhere we call close(3) where the call stack is under test_tar_write_tar_layer. Here's the places (duplicates omitted) where we end up dropping fd 3:

test_tar_write_  192137 [005]  6965.825170: probe:file_close_fd: (ffffffffb24d59c4) fd=0x3
        ffffffffb24d59c5 file_close_fd+0x5 ([kernel.kallsyms])
        ffffffffb24a4c93 __x64_sys_close+0x13 ([kernel.kallsyms])
        ffffffffb31b2952 do_syscall_64+0x82 ([kernel.kallsyms])
        ffffffffb320012f entry_SYSCALL_64+0xaf ([kernel.kallsyms])
            7f1838b17f9c __libc_close+0x3c (inlined)
            7f1839128bc3 _gpgme_io_spawn+0x533 (/usr/lib64/libgpgme.so.11.32.1)
            7f1839130434 _gpgme_get_program_version+0x104 (/usr/lib64/libgpgme.so.11.32.1)
            7f1839116cfa engine_get_version+0x9a (inlined)
            7f1839116cfa _gpgme_set_engine_info+0x9a (inlined)
            7f183912df69 _gpgme_set_engine_info+0x139 (inlined)
            7f183912df69 gpgme_ctx_set_engine_info+0x139 (/usr/lib64/libgpgme.so.11.32.1)
            7f1839730735 ot_gpgme_new_ctx+0x55 (/usr/lib64/libostree-1.so.1.0.0)
            7f183973cde0 sign_data.constprop.0.isra.0+0xc0 (/usr/lib64/libostree-1.so.1.0.0)
            7f18396c9bd8 ostree_repo_sign_commit+0x128 (/usr/lib64/libostree-1.so.1.0.0)
            564efa01f65a ostree::auto::repo::Repo::sign_commit+0x11a (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564efa0396a0 ostree_ext::fixture::Fixture::commit_filedefs+0xd80 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564efa03dd7d ostree_ext::fixture::Fixture::new_v1+0x13d (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9f7f5ed it::test_tar_write_tar_layer::_$u7b$$u7b$closure$u7d$$u7d$::hba964f64edee259b+0x5d (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
test_tar_write_  192090 [003]  6965.876523: probe:file_close_fd: (ffffffffb24d59c4) fd=0x3
        ffffffffb24d59c5 file_close_fd+0x5 ([kernel.kallsyms])
        ffffffffb24a4c93 __x64_sys_close+0x13 ([kernel.kallsyms])
        ffffffffb31b2952 do_syscall_64+0x82 ([kernel.kallsyms])
        ffffffffb320012f entry_SYSCALL_64+0xaf ([kernel.kallsyms])
            7f1838b17f9c __libc_close+0x3c (inlined)
            7f18397364f2 glnx_tmpfile_clear+0x72 (/usr/lib64/libostree-1.so.1.0.0)
            7f18397364f2 glnx_tmpfile_clear+0x72 (/usr/lib64/libostree-1.so.1.0.0)
            7f18397364f2 glnx_tmpfile_clear+0x72 (/usr/lib64/libostree-1.so.1.0.0)
            7f183973bbe9 glnx_file_replace_contents_with_perms_at.constprop.0+0x289 (/usr/lib64/libostree-1.so.1.0.0)
            7f183973bbe9 glnx_file_replace_contents_with_perms_at.constprop.0+0x289 (/usr/lib64/libostree-1.so.1.0.0)
            7f18396d55dd glnx_file_replace_contents_at+0x11d (inlined)
            7f18396d55dd ostree_repo_write_commit_detached_metadata+0x11d (/usr/lib64/libostree-1.so.1.0.0)
            7f18396c9d16 ostree_repo_sign_commit+0x266 (/usr/lib64/libostree-1.so.1.0.0)
            564efa01f65a ostree::auto::repo::Repo::sign_commit+0x11a (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564efa0396a0 ostree_ext::fixture::Fixture::commit_filedefs+0xd80 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564efa03dd7d ostree_ext::fixture::Fixture::new_v1+0x13d (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9f7f5ed it::test_tar_write_tar_layer::_$u7b$$u7b$closure$u7d$$u7d$::hba964f64edee259b+0x5d (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
test_tar_write_  192090 [003]  6965.888085: probe:file_close_fd: (ffffffffb24d59c4) fd=0x3
        ffffffffb24d59c5 file_close_fd+0x5 ([kernel.kallsyms])
        ffffffffb24a4c93 __x64_sys_close+0x13 ([kernel.kallsyms])
        ffffffffb31b2952 do_syscall_64+0x82 ([kernel.kallsyms])
        ffffffffb320012f entry_SYSCALL_64+0xaf ([kernel.kallsyms])
            7f1838b17f9c __libc_close+0x3c (inlined)
            564efa2afe02 alloc::sync::Arc<T,A>::drop_slow+0x32 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9f80d77 <alloc::sync::Arc<T,A> as core::ops::drop::Drop>::drop+0x17e7 (inlined)
            564ef9f80d77 core::ptr::drop_in_place<alloc::sync::Arc<std::os::fd::owned::OwnedFd>>+0x17e7 (inlined)
            564ef9f80d77 ostree_ext::tar::write::write_tar::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h87b7633258c8530c+0x17e7 (inlined)
            564ef9f80d77 ostree_ext::tar::write::write_tar::_$u7b$$u7b$closure$u7d$$u7d$::hea484db2a63bc177+0x17e7 (inlined)
            564ef9f80d77 it::test_tar_write_tar_layer::_$u7b$$u7b$closure$u7d$$u7d$::hba964f64edee259b+0x17e7 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9f27da8 <core::pin::Pin<P> as core::future::future::Future>::poll+0xd8 (inlined)
            564ef9f27da8 <core::pin::Pin<P> as core::future::future::Future>::poll+0xd8 (inlined)
            564ef9f27da8 tokio::runtime::scheduler::current_thread::CoreGuard::block_on::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h2c28b5e84a4875da+0xd8 (inlined)
            564ef9f27da8 tokio::runtime::coop::with_budget+0xd8 (inlined)
            564ef9f27da8 tokio::runtime::coop::budget+0xd8 (inlined)
            564ef9f27da8 tokio::runtime::scheduler::current_thread::CoreGuard::block_on::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h4f9e8e85a91b6d3d+0xd8 (inlined)
            564ef9f27da8 tokio::runtime::scheduler::current_thread::Context::enter+0xd8 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9f31b20 tokio::runtime::scheduler::current_thread::CoreGuard::block_on::_$u7b$$u7b$closure$u7d$$u7d$::hfaa72d97876157bc+0xb0 (inlined)
            564ef9f31b20 tokio::runtime::scheduler::current_thread::CoreGuard::enter::_$u7b$$u7b$closure$u7d$$u7d$::h966a44e5bbb3a502+0xb0 (inlined)
            564ef9f31b20 tokio::runtime::context::scoped::Scoped<T>::set+0xb0 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9f95b56 tokio::runtime::context::set_scheduler::_$u7b$$u7b$closure$u7d$$u7d$::h878f8f33b223bf48+0x46 (inlined)
            564ef9f95b56 std::thread::local::LocalKey<T>::try_with+0x46 (inlined)
            564ef9f95b56 std::thread::local::LocalKey<T>::with+0x46 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9f28624 tokio::runtime::context::set_scheduler+0x64 (inlined)
            564ef9f28624 tokio::runtime::scheduler::current_thread::CoreGuard::enter+0x64 (inlined)
            564ef9f28624 tokio::runtime::scheduler::current_thread::CoreGuard::block_on+0x64 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9f25fd4 tokio::runtime::scheduler::current_thread::CurrentThread::block_on::_$u7b$$u7b$closure$u7d$$u7d$::hddb5990a9daed590+0x244 (inlined)
            564ef9f25fd4 tokio::runtime::context::runtime::enter_runtime+0x244 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9edcfea tokio::runtime::scheduler::current_thread::CurrentThread::block_on+0x5a (inlined)
            564ef9edcfea tokio::runtime::runtime::Runtime::block_on_inner+0x5a (inlined)
            564ef9edcfea tokio::runtime::runtime::Runtime::block_on+0x5a (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9f61bdf it::test_tar_write_tar_layer+0x8f (inlined)
            564ef9f61bdf it::test_tar_write_tar_layer::_$u7b$$u7b$closure$u7d$$u7d$::hdba0b609c1c0fcff+0x8f (inlined)
            564ef9f61bdf core::ops::function::FnOnce::call_once+0x8f (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
test_tar_write_  192090 [003]  6965.900578: probe:file_close_fd: (ffffffffb24d59c4) fd=0x3
        ffffffffb24d59c5 file_close_fd+0x5 ([kernel.kallsyms])
        ffffffffb24a4c93 __x64_sys_close+0x13 ([kernel.kallsyms])
        ffffffffb31b2952 do_syscall_64+0x82 ([kernel.kallsyms])
        ffffffffb320012f entry_SYSCALL_64+0xaf ([kernel.kallsyms])
            7f1838b1da1b __GI___close_nocancel+0xb (inlined)
            564efa2fad2d <std::sys::pal::unix::fs::Dir as core::ops::drop::Drop>::drop+0x1d (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564efa2da603 alloc::sync::Arc<T,A>::drop_slow+0x13 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564efa2fc663 std::sys::pal::unix::fs::remove_dir_impl::remove_dir_all_recursive+0x2a3 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564efa2fc88a std::sys::pal::unix::fs::remove_dir_impl::remove_dir_all+0xfa (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564efa0eabf6 std::fs::remove_dir_all+0x16 (inlined)
            564efa0eabf6 <tempfile::dir::TempDir as core::ops::drop::Drop>::drop+0x16 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564efa08dbcc core::ptr::drop_in_place<tempfile::dir::TempDir>+0xc (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564efa08e3a1 core::ptr::drop_in_place<ostree_ext::fixture::Fixture>+0x11 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9f81260 it::test_tar_write_tar_layer::_$u7b$$u7b$closure$u7d$$u7d$::hba964f64edee259b+0x1cd0 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9f27da8 <core::pin::Pin<P> as core::future::future::Future>::poll+0xd8 (inlined)
            564ef9f27da8 <core::pin::Pin<P> as core::future::future::Future>::poll+0xd8 (inlined)
            564ef9f27da8 tokio::runtime::scheduler::current_thread::CoreGuard::block_on::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h2c28b5e84a4875da+0xd8 (inlined)
            564ef9f27da8 tokio::runtime::coop::with_budget+0xd8 (inlined)
            564ef9f27da8 tokio::runtime::coop::budget+0xd8 (inlined)
            564ef9f27da8 tokio::runtime::scheduler::current_thread::CoreGuard::block_on::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h4f9e8e85a91b6d3d+0xd8 (inlined)
            564ef9f27da8 tokio::runtime::scheduler::current_thread::Context::enter+0xd8 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9f31b20 tokio::runtime::scheduler::current_thread::CoreGuard::block_on::_$u7b$$u7b$closure$u7d$$u7d$::hfaa72d97876157bc+0xb0 (inlined)
            564ef9f31b20 tokio::runtime::scheduler::current_thread::CoreGuard::enter::_$u7b$$u7b$closure$u7d$$u7d$::h966a44e5bbb3a502+0xb0 (inlined)
            564ef9f31b20 tokio::runtime::context::scoped::Scoped<T>::set+0xb0 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9f95b56 tokio::runtime::context::set_scheduler::_$u7b$$u7b$closure$u7d$$u7d$::h878f8f33b223bf48+0x46 (inlined)
            564ef9f95b56 std::thread::local::LocalKey<T>::try_with+0x46 (inlined)
            564ef9f95b56 std::thread::local::LocalKey<T>::with+0x46 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9f28624 tokio::runtime::context::set_scheduler+0x64 (inlined)
            564ef9f28624 tokio::runtime::scheduler::current_thread::CoreGuard::enter+0x64 (inlined)
            564ef9f28624 tokio::runtime::scheduler::current_thread::CoreGuard::block_on+0x64 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)
            564ef9f25fd4 tokio::runtime::scheduler::current_thread::CurrentThread::block_on::_$u7b$$u7b$closure$u7d$$u7d$::hddb5990a9daed590+0x244 (inlined)
            564ef9f25fd4 tokio::runtime::context::runtime::enter_runtime+0x244 (/tmp/ostree-rs-ext/target/debug/deps/it-21cd5226ca3289c1)

At a glance I think the third one would be the most likely problem, since we're under write_tar and we're dropping Arc<std::os::fd::owned::OwnedFd>>

@cgwalters
Copy link
Member Author

cgwalters commented Oct 31, 2024

Ohhhh man I think I figured it out, look at this smoking gun in strace:

dup2(3, 3)                              = 3

Most of the time, the fd we allocate here won't be 3. But sometimes, it will be. That code though is relying on dup2 to strip off O_CLOEXEC but I think that won't happen in this case.

So this would be a bug in https://github.com/coreos/cap-std-ext/blob/bfbb355a81ea47236c412d6fdc73bab99bcb078c/src/cmdext.rs#L33

Testing out a change...

cgwalters added a commit to cgwalters/cap-std-ext that referenced this issue Oct 31, 2024
See ostreedev/ostree-rs-ext#664

Basically in the case when the source fd number is the same
as the target, `dup2` is a no-op. This is an astoundingly evil
bug because it means we just don't pass the expected
fd to the child process.

Fix this by detecting this situation and just stripping off
`O_CLOEXEC`.

Signed-off-by: Colin Walters <[email protected]>
@cgwalters
Copy link
Member Author

Yep this works for me coreos/cap-std-ext#59

cgwalters added a commit to cgwalters/cap-std-ext that referenced this issue Oct 31, 2024
See ostreedev/ostree-rs-ext#664

Basically in the case when the source fd number is the same
as the target, `dup2` is a no-op. This is an astoundingly evil
bug because it means we just don't pass the expected
fd to the child process.

Fix this by detecting this situation and just stripping off
`O_CLOEXEC`.

Signed-off-by: Colin Walters <[email protected]>
cgwalters added a commit to cgwalters/cap-std-ext that referenced this issue Oct 31, 2024
See ostreedev/ostree-rs-ext#664

Basically in the case when the source fd number is the same
as the target, `dup2` is a no-op. This is an astoundingly evil
bug because it means we just don't pass the expected
fd to the child process.

Fix this by detecting this situation and just stripping off
`O_CLOEXEC`.

Signed-off-by: Colin Walters <[email protected]>
@jeckersb jeckersb linked a pull request Nov 4, 2024 that will close this issue
@jeckersb
Copy link
Contributor

jeckersb commented Nov 4, 2024

Closing this as fixed by coreos/cap-std-ext#59

@jeckersb jeckersb closed this as completed Nov 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants