-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable memory mapping of 3d tiff files where possible #116
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #116 +/- ##
==========================================
+ Coverage 92.73% 92.78% +0.05%
==========================================
Files 37 37
Lines 1857 1871 +14
==========================================
+ Hits 1722 1736 +14
Misses 135 135
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
It was pointed out in napari/napari#7602 that you can use The main issues is that previously Also, the order of loading is try |
This works great for me, I tried changing some of our test data (~100GB per channel) into 3D tiffs and there was a significant speed up when using the As for the loading order, to avoid running into issues perhaps we avoid the |
My understanding/guess of the original intention is that
Could we mock Generally, I think this is great - it also fits with our medium-term goal of moving to zarr overall (so I don't mind depending on zarr). My understanding is that the loading functions are meant for BrainGlobe libraries only (ie they are not meant to be a general purpose image reader like I guess a "light" version of these changes could just be for |
I can keep the changes just to |
Yea, I think that's the best way forward for this. Thanks @matham |
Description
What is this PR
Why is this PR needed?
Previously, if the input data were a single 3d tiff stack file, cellinder would load the full data into memory. Which is obviously an issue if the file is big and you run out of memory. The best way around it is to use a folder of tiffs because then it is lazily loaded by dask.
What does this PR do?
This PR will use
tifffile.memmap
which usesnp.memmap
to memory map the file so it doesn't use much RAM. Exactly like dask does for a folder of tiffs. So it'll try to use memmap, and if it fails, e.g. because the files is compressed etc it'll fall back to loading it into RAM like before.References
I also added a similar PR to Napari for the same reason: napari/napari#7602.
How has this PR been tested?
I added pytests but also tested it manually on a whole brain.
Here's the timing info using a folder of tiffs, as one baseline:
Here it is using a single 3d tiff file that loads the whole thing into memory using
main
:Here it is using the changes and using a single 3d tiff file:
This shows there's no performance penalty of the changes. If something, loading the whole brain into memory beforehand adds some slowdowns.
Looking at the RAM usage during the run, with main there was the obvious RAM usage bump for our process. With my changes, our process used minimal RAM - like with a folder of tiffs. However, I did notice that the system process indicated that it was using progressively more RAM, until by the end it used the amount of RAM as the size of the file. I read this to mean that it cached the memory mapped data under the hood, as it was reading the planes during processing so it was using more RAM at the system level. But, I assume if it gets low on RAM the harddrive would purge and it's not actually using all that RAM.
Is this a breaking change?
I don't think so.
Does this PR require an update to the documentation?
No.
Checklist: