-
-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce bind mounting (and thus --bind args) in containers. #1387
Conversation
This pull request introduces 2 alerts when merging 13d604c into ac60dc1 - view on LGTM.com new alerts:
|
I think a better approach would be to leverage The one thing to be aware of is that currently stage_files can either copy or symlink files. As I said in another thread, the brute-force solution would be to copy all the files, the more clever solution is to use either symlinks (requires bind-mounting the job store into the container so that the symlinks have something to point to) or hard links (but these have to be on the same file system as the job store). |
@tetron Thanks for taking a look! Leveraging |
Yes, it should be trivial to add hardlink support to It's an inherit limitation of hard links on unix that they can't cross file systems, the code just has to be aware of that. In other words, if the job store (where the files actually live) and the temp directory where you are going to create the directory tree are on different mounts, the hard link will fail. It is something you could check for ahead of time ("warning: job store and temp directory are on different file systems, files will be copied instead of linked") |
Codecov Report
@@ Coverage Diff @@
## main #1387 +/- ##
===========================================
- Coverage 65.86% 55.84% -10.02%
===========================================
Files 91 45 -46
Lines 16061 7875 -8186
Branches 4138 1981 -2157
===========================================
- Hits 10578 4398 -6180
+ Misses 4351 2955 -1396
+ Partials 1132 522 -610
Continue to review full report at Codecov.
|
Just an update on this. This is a fairly naive attempt, and changes the paths and bindmounts a new staging temp dir, but it still includes the targets from the original pathmapper dir ( My current thinking is that I need to set the stagedir earlier, when the pathmapper is originally instantiated, so that the correct paths propogate to the docker/singularity command itself (and elsewhere). I'm trying to determine where to best do this.
|
I've changed this to be a more conservative change, and only bind mount the base directories of read-only files, which should fix the originally posted issue and I assume a large number of others in the near term. Trying to do something like |
I'll have to find time to look this over. @mr-c also reminded me that we need to migrate the pull request tests off of travis to something that isn't so severely quota constrained, so that's also a blocker to be able to merge this PR. |
@tetron Three conformance tests are failing in v1.2. There are file path conflicts and needs to be handled more intelligently. Will update this to fix the issue. |
So this fails with conformance tests: It looks like the first two are easier, and it's a matter of making sure the dir destinations don't collide rather than the entire bind mount. Test 338 is weirder. It's a permissions issue, where a directory is kind of created as |
This is still on my to-do list to look at. In the mean time, we moved over to github actions for CI and a bunch of tests are not reporting. Maybe you need to merge master? |
The following tests fail:
|
cwltool/pathmapper.py
Outdated
@@ -171,7 +171,9 @@ def setup(self, referenced_files: List[CWLObjectType], basedir: str) -> None: | |||
stagedir = self.stagedir | |||
for fob in referenced_files: | |||
if self.separateDirs: | |||
stagedir = os.path.join(self.stagedir, "stg%s" % uuid.uuid4()) | |||
stagedir = os.path.join(self.stagedir, | |||
"s5g%s" % uuid.uuid5(uuid.UUID('6a56ca02-b6f0-4c1a-a4b0-fb0068ce80ad'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uuid5 is deterministic. This groups together files that were previously grouped together by providing the same uuid for each file with the same os.path.dirname
, so it violates completely separating files. This should avoid conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok! Why s5g
instead of stg
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you be passing something to uuid.uuid5
related to the fob.location
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed s5g
just in case this needs to distinguish behavior in the future based on versions. Though it's not necessary.
And fob['location']
is line 176, the comment blurb doesn't make it visible here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So "s5g%s" % uuid.uuid5(uuid.UUID('6a56ca02-b6f0-4c1a-a4b0-fb0068ce80ad'
generates a static string that will never change, to which you make a subdir based upon the directory name of the location
field. So why not skip to the directory name itself? Or use it to see the uuid5
generator?
What if there are location URIs that end in the same directory name but are otherwise unrelated? This could lead to files overwriting each other. Or files being adjacent to one another when they should not be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go through my thinking on this, and please point out any flaws.
Uuid5 generates different uuids for different seeds, the seed in this PR being both the file's directory and a static uuid. So uuid.uuid5(arg1, arg2) = uuid_a
and uuid.uuid5(arg1, arg3) = uuid_b
. The difference is that if uuid.uuid5(arg1, arg2) = uuid_a
occurs again, it will generate the same uuid_a
output.
>>> from uuid import uuid5, UUID
>>> uuid5(UUID('6a56ca02-b6f0-4c1a-a4b0-fb0068ce80ad'), '/tmp/1')
UUID('e3e49c59-338c-5369-a215-3d68bda76d69')
>>> uuid5(UUID('6a56ca02-b6f0-4c1a-a4b0-fb0068ce80ad'), '/tmp/2')
UUID('48220b67-6c36-500b-ae08-ecfa48951864')
>>> uuid5(UUID('6a56ca02-b6f0-4c1a-a4b0-fb0068ce80ad'), '/tmp')
UUID('98da1ac5-114d-539b-b727-948e7754ec95')
>>> uuid5(UUID('6a56ca02-b6f0-4c1a-a4b0-fb0068ce80ad'), '/tmp/1')
UUID('e3e49c59-338c-5369-a215-3d68bda76d69')
The idea being that all pre-existing paths cannot conflict since they co-existed together peacefully in the first place.
So files:
/tmp/a/b.txt
/tmp/b/a.txt
/tmp/b/b.txt
/tmp/c/b.txt
would map to these inside of the container:
uuid_a/b.txt
uuid_b/a.txt
uuid_b/b.txt
uuid_c/b.txt
So why not skip to the directory name itself?
If I understand your question correctly, we could use the same directory names as the files originally had, but they might conflict with folders that already exist within the container.
Or use it to see the uuid5 generator?
If I understand this to be to "seed" the uuid5 generator with the directory, I'm using os.path.dirname(fob['location']))
to seed the uuid5. Though I'm not sure if that's the question here.
What if there are location URIs that end in the same directory name but are otherwise unrelated? This could lead to files overwriting each other. Or files being adjacent to one another when they should not be.
I was wondering about this, and was trying to reason if this could happen if the input files were writable in a an input dir outside of the workdir. I haven't tested this.
Some of this too is that I tried to switch to a simpler strategy and it may just not work.
cwltool/singularity.py
Outdated
src = os.path.dirname(src) | ||
dst = os.path.dirname(dst) | ||
bind_arg = f"--bind={src}:{dst}:{writable}" | ||
if bind_arg not in runtime: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried running this PR after getting the argument list too long
error when using large number of input directories, and it seems that the source
directory here is actually different for each file, and therefore it did not reduce the number of --bind
arguments:
...
--bind=/juno/work/access/testing/users/johnsoni/access_qc/5500_HF/jobStore/files/for-job/kind-CWLJob/instance-aop8iz_w/file-34cfcc9ed208473bb67710a5108f1a32:/var/lib/cwl/s5g5ff703f1-41b2-56d3-93b3-7e36c1a37224/all_qc_files/duplex_bam_sequence_qc_dir/C-R0CLHU-N001-d:ro \
--bind=/juno/work/access/testing/users/johnsoni/access_qc/5500_HF/jobStore/files/for-job/kind-CWLJob/instance-aop8iz_w/file-fee328e946c9448c92ee501c6daa4b5a:/var/lib/cwl/s5g5ff703f1-41b2-56d3-93b3-7e36c1a37224/all_qc_files/duplex_bam_sequence_qc_dir/C-R0CLHU-N001-d:ro \
...
It seems toil actually produces different parent directories for files that might have originally came from the same Directory object. So if files that came from the same directory can be stored as such in the jobstore, I would expect this change to fix the issue
@mr-c @tetron I've only implemented this for docker and not singularity. It's a hack that builds a new docker container with the files copied inside if the CLI character limit of 2mb is reached. It uses a staging directory because docker needs all files to be within the runtime context and a staging directory guarantees that. It currently just copies files, but that could be modified to be linking. This is not as good as the link tree Peter originally proposed, but I'm really not sure how to do that without injecting something into the command to link things after the staging directory is bind mounted. If this makes sense to you for docker, I can do the same for singularity. Conformance tests pass on my system when I remove the len(runtime) conditional check and force new docker containers built for all tests with valid inputs. |
This pull request introduces 1 alert when merging abf614c into 2b17b14 - view on LGTM.com new alerts:
|
This pull request introduces 2 alerts when merging 664df85 into 59f1675 - view on LGTM.com new alerts:
|
Closing in favor of #1568. |
Eases an issue where too many
--bind
args can break the (2mb
on my system) limit on the command line length. This is a much bigger Toil problem as I think cwltool handles directories of files a bit more elegantly.Discussed in: DataBiosphere/toil#3358
And in: https://cwl.discourse.group/t/too-many-arguments-on-the-command-line/248
This is currently a draft and won't run/work because I don't run the hard-linking script currently (see below; open to suggestions).
This PR takes any individual files that are being added as
--bind
volumes to singularity and instead creates a hardlink of the file inside of one universal folder which is bind-mounted with only one--bind
arg. I don't do anything about folders, since they're a two-way street and need to be able to have outputs dropped in them, so I'm not sure I can do much about those and they can still, in theory, break the limit if you add enough of them.A script is then created that, when run inside of the container, hard links all of the files to where they should be.
I haven't actually figured out a good way to prepend the hard linking script command before the real command runs.
Multiple commands , such as
singularity exec docker://ubuntu bash -c "echo hello; echo goodbye"
work, so I'd like to sandwich the command into a place like this if possible: https://github.com/common-workflow-language/cwltool/blob/main/cwltool/command_line_tool.py#L1022 .I'm having trouble finding a way of doing that in a way that isn't jarring to me though. I was hoping for something like the
docker run --entrypoint
, but I don't think that something like that exists here. I'll think about it a bit more and post here when things actually run and pass. Very open to suggestions in the meantime (especially if I'm doing something wrong, against the design, or there's a more obvious way to do this, please let me know).