Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable memory mapping of 3d tiff files where possible #116

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

matham
Copy link
Contributor

@matham matham commented Feb 13, 2025

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?

Previously, if the input data were a single 3d tiff stack file, cellinder would load the full data into memory. Which is obviously an issue if the file is big and you run out of memory. The best way around it is to use a folder of tiffs because then it is lazily loaded by dask.

What does this PR do?

This PR will use tifffile.memmap which uses np.memmap to memory map the file so it doesn't use much RAM. Exactly like dask does for a folder of tiffs. So it'll try to use memmap, and if it fails, e.g. because the files is compressed etc it'll fall back to loading it into RAM like before.

References

I also added a similar PR to Napari for the same reason: napari/napari#7602.

How has this PR been tested?

I added pytests but also tested it manually on a whole brain.

Here's the timing info using a folder of tiffs, as one baseline:

brainmapper -s "L:\imaging\staging\20250123\MF1_132F_W_tiffs\MF1_132F_W_BS_640" -b "L:\imaging\staging\20250123\MF1_132F_W_tiffs\MF1_132F_W_BS_488" -o "L:\imaging\staging\20250123\MF1_132F_W_tiffs\output" --no-register --no-classification --no-figures --max-cluster-size 100000 --soma-diameter 8 --ball-xy-size 6 --ball-z-size 8 --ball-overlap-fraction 0.5 --log-sigma-size 0.5 --threshold 5 --soma-spread-factor 1.4 -v 4 2.03 2.03 --orientation ial --n-free-cpus 12
2025-02-12 18:12:46 PM INFO     2025-02-12 18:12:46 PM - INFO - MainProcess fancylog.py:314 - Starting logging                                                                                                                                       fancylog.py:314
                       INFO     2025-02-12 18:12:46 PM - INFO - MainProcess fancylog.py:315 - Multiprocessing-logging module found. Logging from all processes                                                                                       fancylog.py:315
                       INFO     2025-02-12 18:12:46 PM - INFO - MainProcess main.py:70 - Skipping registration                                                                                                                                            main.py:70
2025-02-12 18:12:58 PM INFO     2025-02-12 18:12:58 PM - INFO - MainProcess main.py:116 - Detecting cell candidates                                                                                                                                      main.py:116
Processing planes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2070/2070 [45:41<00:00,  1.32s/it]
2025-02-12 18:58:41 PM INFO     2025-02-12 18:58:41 PM - INFO - MainProcess volume_filter.py:451 - Splitting cell clusters and writing results                                                                                                  volume_filter.py:451
Splitting cell clusters: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39643/39643 [01:48<00:00, 366.07it/s]
Detection complete. Found 3516361 cells in 0:48:17.364115
2025-02-12 19:04:21 PM INFO     2025-02-12 19:04:21 PM - INFO - MainProcess main.py:195 - Skipping cell classification                                                                                                                                   main.py:195
                       INFO     2025-02-12 19:04:21 PM - INFO - MainProcess main.py:212 - Skipping cell position analysis                                                                                                                                main.py:212
                       INFO     2025-02-12 19:04:21 PM - INFO - MainProcess main.py:241 - Skipping figure generation                                                                                                                                     main.py:241
                       INFO     2025-02-12 19:04:21 PM - INFO - MainProcess main.py:95 - Finished. Total time taken: 0:51:35.354918

Here it is using a single 3d tiff file that loads the whole thing into memory using main:

brainmapper -s "J:\imaging\fused\20250123\MF1_132F_W\MF1_132F_W_BS_640.tif" -b "J:\imaging\fused\20250123\MF1_132F_W\MF1_132F_W_BS_488.tif" -o "L:\imaging\staging\20250123\MF1_132F_W_tiffs\output" --no-register --no-classification --no-figures --max-cluster-size 100000 --soma-diameter 8 --ball-xy-size 6 --ball-z-size 8 --ball-overlap-fraction 0.5 --log-sigma-size 0.5 --threshold 5 --soma-spread-factor 1.4 -v 4 2.03 2.03 --orientation ial --n-free-cpus 12
2025-02-12 20:20:46 PM INFO     2025-02-12 20:20:46 PM - INFO - MainProcess fancylog.py:314 - Starting logging                                                                                                                                       fancylog.py:314
                       INFO     2025-02-12 20:20:46 PM - INFO - MainProcess fancylog.py:315 - Multiprocessing-logging module found. Logging from all processes                                                                                       fancylog.py:315
2025-02-12 20:20:47 PM INFO     2025-02-12 20:20:47 PM - INFO - MainProcess main.py:70 - Skipping registration                                                                                                                                            main.py:70
2025-02-12 20:20:54 PM INFO     2025-02-12 20:20:54 PM - INFO - MainProcess main.py:116 - Detecting cell candidates                                                                                                                                      main.py:116
Processing planes: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2070/2070 [1:01:31<00:00,  1.78s/it]
2025-02-12 21:29:09 PM INFO     2025-02-12 21:29:09 PM - INFO - MainProcess volume_filter.py:451 - Splitting cell clusters and writing results                                                                                                  volume_filter.py:451
Splitting cell clusters: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39643/39643 [01:49<00:00, 363.68it/s]
Detection complete. Found 3516361 cells in 1:04:22.595883
2025-02-12 21:35:30 PM INFO     2025-02-12 21:35:30 PM - INFO - MainProcess main.py:195 - Skipping cell classification                                                                                                                                   main.py:195
                       INFO     2025-02-12 21:35:30 PM - INFO - MainProcess main.py:212 - Skipping cell position analysis                                                                                                                                main.py:212
                       INFO     2025-02-12 21:35:30 PM - INFO - MainProcess main.py:241 - Skipping figure generation                                                                                                                                     main.py:241
2025-02-12 21:36:04 PM INFO     2025-02-12 21:36:04 PM - INFO - MainProcess main.py:95 - Finished. Total time taken: 1:15:18.007133

Here it is using the changes and using a single 3d tiff file:

brainmapper -s "J:\imaging\fused\20250123\MF1_132F_W\MF1_132F_W_BS_640.tif" -b "J:\imaging\fused\20250123\MF1_132F_W\MF1_132F_W_BS_488.tif" -o "L:\imaging\staging\20250123\MF1_132F_W_tiffs\output" --no-register --no-classification --no-figures --max-cluster-size 100000 --soma-diameter 8 --ball-xy-size 6 --ball-z-size 8 --ball-overlap-fraction 0.5 --log-sigma-size 0.5 --threshold 5 --soma-spread-factor 1.4 -v 4 2.03 2.03 --orientation ial --n-free-cpus 12
2025-02-12 19:23:30 PM INFO     2025-02-12 19:23:30 PM - INFO - MainProcess fancylog.py:314 - Starting logging                                                                                                                                       fancylog.py:314
                       INFO     2025-02-12 19:23:30 PM - INFO - MainProcess fancylog.py:315 - Multiprocessing-logging module found. Logging from all processes                                                                                       fancylog.py:315
                       INFO     2025-02-12 19:23:30 PM - INFO - MainProcess main.py:70 - Skipping registration                                                                                                                                            main.py:70
2025-02-12 19:23:37 PM INFO     2025-02-12 19:23:37 PM - INFO - MainProcess main.py:116 - Detecting cell candidates                                                                                                                                      main.py:116
Processing planes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2070/2070 [46:58<00:00,  1.36s/it]
2025-02-12 20:10:38 PM INFO     2025-02-12 20:10:38 PM - INFO - MainProcess volume_filter.py:451 - Splitting cell clusters and writing results                                                                                                  volume_filter.py:451
Splitting cell clusters: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39643/39643 [01:47<00:00, 368.13it/s]
Detection complete. Found 3516361 cells in 0:49:47.688208
2025-02-12 20:16:39 PM INFO     2025-02-12 20:16:39 PM - INFO - MainProcess main.py:195 - Skipping cell classification                                                                                                                                   main.py:195
                       INFO     2025-02-12 20:16:39 PM - INFO - MainProcess main.py:212 - Skipping cell position analysis                                                                                                                                main.py:212
                       INFO     2025-02-12 20:16:39 PM - INFO - MainProcess main.py:241 - Skipping figure generation                                                                                                                                     main.py:241
2025-02-12 20:16:57 PM INFO     2025-02-12 20:16:57 PM - INFO - MainProcess main.py:95 - Finished. Total time taken: 0:53:27.393816

This shows there's no performance penalty of the changes. If something, loading the whole brain into memory beforehand adds some slowdowns.

Looking at the RAM usage during the run, with main there was the obvious RAM usage bump for our process. With my changes, our process used minimal RAM - like with a folder of tiffs. However, I did notice that the system process indicated that it was using progressively more RAM, until by the end it used the amount of RAM as the size of the file. I read this to mean that it cached the memory mapped data under the hood, as it was reading the planes during processing so it was using more RAM at the system level. But, I assume if it gets low on RAM the harddrive would purge and it's not actually using all that RAM.

Is this a breaking change?

I don't think so.

Does this PR require an update to the documentation?

No.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality (unit & integration)
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

Copy link

codecov bot commented Feb 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.78%. Comparing base (5d79c85) to head (7e898d8).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #116      +/-   ##
==========================================
+ Coverage   92.73%   92.78%   +0.05%     
==========================================
  Files          37       37              
  Lines        1857     1871      +14     
==========================================
+ Hits         1722     1736      +14     
  Misses        135      135              
Flag Coverage Δ
numba 92.73% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@matham
Copy link
Contributor Author

matham commented Feb 13, 2025

It was pointed out in napari/napari#7602 that you can use zarr/dask from tifffile to lazily load tiff files. memmap only works for simple tiffs e.g. where the data is not compressed. So I added zarr as a fallback. However, zarr has to be installed and it also doesn't work will all tiffs, depending on the version, so if zarr and memmap doesn't work we fall back on fully loading.

The main issues is that previously load_img_stack and by extension load_any would not return dask arrays, as far as I can tell. But now it can. I'm not sure if this will be a issue for downstream usage of it. read_z_stack could previously already return dask arrays so there's no change there. Although I'm not sure why there's load_img_stack and read_z_stack.

Also, the order of loading is try memmap -> zarr -> ndarray. I can make memmap fail by compressing the tiff. I can't make zarr reliably fail since it's more of a zarr version thing, so I have no way of having coverage tests for the ndarray part.

@IgorTatarnikov
Copy link
Member

This works great for me, I tried changing some of our test data (~100GB per channel) into 3D tiffs and there was a significant speed up when using the memmap strategy.

As for the loading order, to avoid running into issues perhaps we avoid the zarr -> dask strategy for now as this might have unintended consequences downstream where users not expecting a dask array. I'll let @alessandrofelder weigh in as well!

@alessandrofelder alessandrofelder self-requested a review February 17, 2025 11:00
@alessandrofelder
Copy link
Member

alessandrofelder commented Feb 19, 2025

Although I'm not sure why there's load_img_stack and read_z_stack.

My understanding/guess of the original intention is that

  • all the load_any functions are supposed/expected to return a numpy array
  • read_z_stack is specific and separate for the "folder with 2D tiffs -> dask" array case (while also allowing to pass a 3D tiff that fits in memory). This is only used by brainmapper to pass to cellfinder.main and none of the other (mature) BrainGlobe tools.

I can make memmap fail by compressing the tiff. I can't make zarr reliably fail since it's more of a zarr version thing, so I have no way of having coverage tests for the ndarray part.

Could we mock da.from_zarr and make it have ModuleNotFoundError side effect?


Generally, I think this is great - it also fits with our medium-term goal of moving to zarr overall (so I don't mind depending on zarr).

My understanding is that the loading functions are meant for BrainGlobe libraries only (ie they are not meant to be a general purpose image reader like bioio), so we would probably be open to refactoring the downstream use cases if we decide the performance improvement is worth the developer time? To discuss at tomorrow's developer meeting.

I guess a "light" version of these changes could just be for read_z_stack limiting the improvement to cellfinder where it's most needed?

@matham
Copy link
Contributor Author

matham commented Feb 20, 2025

I can keep the changes just to read_z_stack, if we want to keep the other to numpy until full zarr is supported.

@alessandrofelder
Copy link
Member

I can keep the changes just to read_z_stack, if we want to keep the other to numpy until full zarr is supported.

Yea, I think that's the best way forward for this. Thanks @matham

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants