Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: limit the number of concurrent load_internal calls to avoid exce… #17377

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Braymatter
Copy link
Contributor

@Braymatter Braymatter commented Jan 15, 2025

fix: limit the number of concurrent load_internal calls to avoid exceeding the OS file descriptor limit

Objective

Fixes #14542

Solution

Added a Semaphore to the AssetServer that load_internal blocks on, preventing the app from requesting too many file descriptors from the OS.

Testing

Tested using the repro in #14542 with 1500 images, and 1500 glb's with asset preprocessing enabled. On IOS at a permit-limit of 127 experienced zero failures repeatedly on an iPhone 16 physical device.

Migration Guide

I don't think there are any breaking changes (unsure if adding a field to a struct counts)

Comment on lines 121 to 124
let file_limit = 127; // The normal limit is 256, cut in half for .meta files and sub 1 because 128 still throws the occasional error (3 failed files out of 1500)

#[cfg(not(target_os = "ios"))]
let file_limit = 16000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much:
https://stackoverflow.com/questions/57007894/permanently-increase-file-descriptor-ulimit-in-android-without-being-root

Seemed like a reasonable limit, considering it wasn't limited at all before - it's going to vary with platform OS and I didn't find exhaustive documentation on specific limits for every OS and distro.

I'm not very concerned as I've never run into this problem on a desktop machine and my projects are bigger than most.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of correctness, these values seem very magic and arbitrary. They seem like they could change at any time or even by configurable by the gamer. Is there a way to instead detect whether the limit has been reached when performing the operation instead of assuming what the limit is at compile-time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values should have a primary source linked in a comment. If there is an authoritative resource, maaaayybe we can rely on them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I seriously doubt that anyone is changing the file descriptor limit on ios. The ios limit of 256 is somewhat well documented online, but can always change. I ran ulimit -n on my macos machine and got 2560, which is where the other value came from.

16000 as a limit was a wild-guess.

I was unable to find a cross platform, exhaustive way to detect file system limits. As you mentioned they can change arbitrarily. If anything I would set it to 127 for every platform as most apps aren't even going to have that many assets, and the ones that do will likely have a negligible/user undetectable increase in load times.

@BenjaminBrienen BenjaminBrienen added C-Bug An unexpected or incorrect behavior A-Assets Load files from disk to use for things like images, models, and sounds O-iOS Specific to the iOS mobile operating system D-Straightforward Simple bug fixes and API improvements, docs, test and examples S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Jan 15, 2025
@JMS55
Copy link
Contributor

JMS55 commented Jan 15, 2025

Do we not retry failed asset loads? I would assume they would fail, back off, and then retry.

@Braymatter
Copy link
Contributor Author

@JMS55 - My projects use bevy_asset_loader so I'm not entirely sure. I didn't see any retry logs etc. in my main project, it would just fail and never transition to the next state. I don't recall seeing any retry logging in the repro etc. today either.

squashme: Remove fat-fingered value for ios
let file_limit = 255; // The normal limit is 256, cut in half for .meta files and sub 1 because 128 still throws the occasional error (3 failed files out of 1500)

#[cfg(target_os = "macos")]
let file_limit = 2559;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link / explanation of where these numbers came from please!

Copy link
Contributor Author

@Braymatter Braymatter Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, and added what I was able to find. I searched around a bit in an effort to come up with something better but docs are very scarce because the limit is (very) changeable per platform.

We could potentially invoke ulimit -n on startup, but that didn't seem like a very good solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ulimit could be changed during the middle of asset loading

@alice-i-cecile alice-i-cecile added this to the 0.16 milestone Jan 15, 2025
@JMS55
Copy link
Contributor

JMS55 commented Jan 15, 2025

Looks like we do this in a test

fn asset_load_error_event_handler(
, but don't actually have it hooked up...

@@ -61,6 +62,9 @@ pub(crate) struct AssetServerData {
sources: AssetSources,
mode: AssetServerMode,
meta_check: AssetMetaCheck,

///Used to ensure the `asset_server` does not try to acquire more loaders (and thus `file_handles`) than the OS allows
asset_counter: Semaphore,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach looks good to me, but I'm thinking we should scope this to FileAssetReader instead, as this is a "file descriptor" limitation protection? We wouldn't want this limit on an in-memory asset source

Copy link
Contributor Author

@Braymatter Braymatter Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial instinct was to keep in in file_asset as well, but I couldn't think of a way to detect when the file was actually closed/dropped, as the AssetReader returns the File back to the caller.

If I were to add the Semaphore to the AssetReader impl in file_asset.rs it would release the permit as soon as it loaded the file and returned, and not after it was finished being read/processed etc. My understanding is that the host OS will keep the file descriptor allocated until the process closes (or in Rust's case drops) the file.

I could definitely be misreading/misunderstanding something so open to suggestions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some previous discussion here:
https://discord.com/channels/691052431525675048/749332104487108618/1328572829930881066

We could read the limit from a package.metadata parameter and leave it to the client application developers to set, while defaulting to u32::Max or something sensible.

It would then be opt-in, and wouldn't affect network/memory etc. AssetReaders by default. The downside being client app developers would have to use the lowest limit of any of their target platforms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we could implement the Reader trait for a wrapper containing both a semaphore handle and the File, then return that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idk if the async_lock semaphore impl will meet the send / sync requirements there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a stab at it here: Braymatter@a415213

It... compiles but the Semaphores are no longer preventing too many file handles from being opened. Interestingly it seems like the app is provisioning two FileAssetReaders based on this screenshot / log showing up twice in the xcode debugger:

image

I think this would actually be a problem anyway because you may want to load assets from the normal assets/imported_assets AND load downloaded assets from (in my case) the App sandbox fs.

There's also the issue of the AssetWriter also potentially reserving file descriptors.

We could do some kind of global/shared semaphore, but I could also be misunderstanding how some of this fits together.

Open to more suggestions - I'm unsure why this isn't working; I would expect that halving the amount of Semaphore permits would resolve the issue with two asset loaders but it doesn't seem to be the case.

@Braymatter
Copy link
Contributor Author

@JMS55 - That would be very nice to have hooked up! I think in any case we probably don't want to thrash the file system more than we have to so some limit is probably good to have in place. This would be a great fallback for platforms / systems where the file descriptor limit is non-standard / unexpectedly low

@alice-i-cecile
Copy link
Member

Do we not retry failed asset loads? I would assume they would fail, back off, and then retry.

We currently do not.

std::fs calls by the client application
*/
#[cfg(target_os = "macos")]
let file_limit = 1279;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these not just CONST?

@Braymatter
Copy link
Contributor Author

Braymatter commented Jan 16, 2025

On further reflection - I think we should expose the limit for concurrent asset loads as a package.metadata field if possible, and keep the initial implementation of this. Since the file descriptor limit is per-process (and not per-bevy-asset) we should let the user set a limit themselves in case they're doing odd things with the filesystem directly via std::fs

If the package.metadata value isn't set we can just set the default limit for the Semaphore to usize::max_value(), so this wouldn't impact most projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Assets Load files from disk to use for things like images, models, and sounds C-Bug An unexpected or incorrect behavior D-Straightforward Simple bug fixes and API improvements, docs, test and examples O-iOS Specific to the iOS mobile operating system S-Needs-Review Needs reviewer attention (from anyone!) to move forward
Projects
None yet
Development

Successfully merging this pull request may close these issues.

On iOS, loading many assets in AssetMode::Processed causes some assets to fail to load
6 participants