Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output destination #1219

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft

Output destination #1219

wants to merge 11 commits into from

Conversation

tetron
Copy link
Member

@tetron tetron commented Nov 22, 2019

Experimental feature to set output destination of output files and directories. Simplifies file renaming and organizing output into a desired directory structure.

Work in progress.

Example:

outputs:
  bar:
    type: File
    outputSource: foo
hints:
  cwltool:OutputDestination:
    destinations:
      bar: bar/

This causes the "foo" file to be placed in the subdirectory "bar" (relative to the base output directory) in the final output.

The idea is for a future CWL spec (maybe 1.2?) to incorporate this directly into the output parameter:

outputs:
  bar:
    type: File
    outputSource: foo
    destination: bar/

@mr-c
Copy link
Member

mr-c commented Nov 23, 2019

Is this destination for downstream consumers or only final outputs?

If they later, then it would be better scoped as only being allowed in Workflow outputs (for 1.2) or as a requirement in Workflow proceses with no parent (as an extension).

I see the user value, but this feels like it could get messy/confusing as per your example. How to deal with conflicts, etc..

@tetron
Copy link
Member Author

tetron commented Nov 24, 2019

Is this destination for downstream consumers or only final outputs?

In the current implementation, it only applies to final outputs. However, the behavior allows for renaming files and directories as well as the more obvious use of placing into subdirectories, which seems like it could be useful mid-workflow.

I see the user value, but this feels like it could get messy/confusing as per your example. How to deal with conflicts, etc..

The primary motivation is that if you want to produce output with specific file names and/or directory structure, currently you need to use a Javascript hack like this:

https://github.com/common-workflow-language/cwl-website/blob/master/site/mergesecondary.cwl

With this feature, you directly output parameters with destination directories. You might still need to propagate a list of destination directories, but comparatively it is much easier to for the end user.

A secondary motivation is that because "destination" gives you a way to specify output filenames that are static (or trivially derived from input) it seems like it could support reasoning about the behavior of a workflow -- thinking of rule-based systems like make where you backtrack from a desired target. (I'm afraid to even mention this, I don't want to sidetrack discussion, the first use case is much more concrete and addresses a very clear need).

@codecov
Copy link

codecov bot commented Nov 25, 2019

Codecov Report

Merging #1219 into master will decrease coverage by 1.38%.
The diff coverage is 82.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1219      +/-   ##
==========================================
- Coverage   77.07%   75.69%   -1.39%     
==========================================
  Files          35       35              
  Lines        7216     7257      +41     
  Branches     1853     1843      -10     
==========================================
- Hits         5562     5493      -69     
- Misses       1182     1273      +91     
- Partials      472      491      +19
Impacted Files Coverage Δ
cwltool/command_line_tool.py 77.4% <0%> (-0.49%) ⬇️
cwltool/pathmapper.py 83.18% <100%> (+0.07%) ⬆️
cwltool/executors.py 77.93% <100%> (-1.1%) ⬇️
cwltool/process.py 85.56% <80%> (-0.78%) ⬇️
cwltool/sandboxjs.py 51.28% <0%> (-21.8%) ⬇️
cwltool/utils.py 61.7% <0%> (-10.64%) ⬇️
cwltool/singularity.py 60.86% <0%> (-1.45%) ⬇️
cwltool/docker.py 54.58% <0%> (-1.32%) ⬇️
cwltool/provenance.py 77.64% <0%> (-1.17%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4642316...c22b943. Read the comment docs.

@tetron
Copy link
Member Author

tetron commented Nov 26, 2019

Discussion points:

  • What does destination do when specified within a workflow? Possibilities:
    • ignore
    • disallow
    • apply renames, but not directories
    • associate destination with file object or URI, propagate destination to end
  • Have single destination which is parsed to separate dirname + basename, or two values destDir, destName ?
  • File name conflicts -- probably needs to be an error if there are conflicting files with the same destination, can rename conflicting output files that don't have an explicit destination

@mr-c mr-c changed the base branch from master to main July 2, 2020 11:23
@mr-c mr-c marked this pull request as draft December 3, 2020 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants