We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I'm running into this issue running a simple stage-in CWL on AKS. I've modified the default storageClass to use AzureFile (https://pascalnaber.wordpress.com/2018/01/26/persistent-storage-and-volumes-using-kubernetes-on-azure-with-aks-or-azure-container-service/) to support ReadWriteMany.
ReadWriteMany
My CWL is:
#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool hints: DockerRequirement: dockerPull: curlimages/curl baseCommand: [curl] inputs: input_url: type: string inputBinding: prefix: -O input_file: type: string outputs: localized_file: type: File outputBinding: glob: $(inputs.input_file) stdout_file: type: stdout stderr_file: type: stderr stdout: stdout_stage-in.txt stderr: stderr_stage-in.txt
and my K8s job YAML:
--- apiVersion: batch/v1 kind: Job metadata: name: calrissian-job spec: template: spec: containers: - name: calrissian-job image: pymonger/calrissian:latest envFrom: - secretRef: name: aws-creds command: ["calrissian"] args: - "--debug" - "--stdout" - "/calrissian/output-data/docker-output.json" - "--stderr" - "/calrissian/output-data/docker-stderr.log" - "--max-ram" - "16G" - "--max-cores" - "8" - "--tmp-outdir-prefix" - "/calrissian/tmpout/" - "--outdir" - "/calrissian/output-data/" - "--usage-report" - "/calrissian/output-data/docker-usage.json" - "https://raw.githubusercontent.com/pymonger/soamc-cwl-demo/develop/baseline-pge/stage-in.cwl" - "--input_url" - "https://s3-us-west-2.amazonaws.com/landsat-pds/L8/010/117/LC80101172015002LGN00/LC80101172015002LGN00_BQA.TIF" - "--input_file" - "LC80101172015002LGN00_BQA.TIF" volumeMounts: - mountPath: /calrissian/input-data name: calrissian-input-data readOnly: true - mountPath: /calrissian/tmpout name: calrissian-tmpout - mountPath: /calrissian/output-data name: calrissian-output-data env: - name: CALRISSIAN_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name restartPolicy: Never volumes: - name: calrissian-input-data persistentVolumeClaim: claimName: calrissian-input-data readOnly: true - name: calrissian-tmpout persistentVolumeClaim: claimName: calrissian-tmpout - name: calrissian-output-data persistentVolumeClaim: claimName: calrissian-output-data
The pod's log shows:
INFO calrissian 0.10.0 (cwltool 3.1.20211004060744) DEBUG Parsed job order from command line: { "id": "https://raw.githubusercontent.com/pymonger/soamc-cwl-demo/develop/baseline-pge/stage-in.cwl", "input_file": "LC80101172015002LGN00_BQA.TIF", "input_url": "https://s3-us-west-2.amazonaws.com/landsat-pds/L8/010/117/LC80101172015002LGN00/LC80101172015002LGN00_BQA.TIF" } DEBUG Starting ThreadPoolJobExecutor.run_jobs: total_resources=[ram: 16000.0, cores: 8.0], max_workers=None DEBUG [job stage-in.cwl] initializing from https://raw.githubusercontent.com/pymonger/soamc-cwl-demo/develop/baseline-pge/stage-in.cwl DEBUG [job stage-in.cwl] { "input_file": "LC80101172015002LGN00_BQA.TIF", "input_url": "https://s3-us-west-2.amazonaws.com/landsat-pds/L8/010/117/LC80101172015002LGN00/LC80101172015002LGN00_BQA.TIF" } DEBUG [job stage-in.cwl] path mappings is {} DEBUG [job stage-in.cwl] command line bindings is [ { "position": [ -1000000, 0 ], "datum": "curl" }, { "prefix": "-O", "position": [ 0, "input_url" ], "datum": "https://s3-us-west-2.amazonaws.com/landsat-pds/L8/010/117/LC80101172015002LGN00/LC80101172015002LGN00_BQA.TIF" } ] DEBUG wait_for_completion with 0 futures DEBUG wait_for_completion with 0 futures DEBUG allocate [ram: 1024, cores: 1] from available [ram: 16000.0, cores: 8.0] DEBUG wait_for_completion with 1 futures DEBUG [job stage-in.cwl] initial work dir {} Building resources spec from {'cores': 1, 'ram': 1024} -------------------------------------------------------------------------------- apiVersion: v1 kind: Pod metadata: labels: {} name: stage-in-cwl-pod-ydxduxah spec: containers: - args: - curl -O https://s3-us-west-2.amazonaws.com/landsat-pds/L8/010/117/LC80101172015002LGN00/LC80101172015002LGN00_BQA.TIF > stdout_stage-in.txt 2> stderr_stage-in.txt command: - /bin/sh - -c env: - name: HOME value: /XiTfjy - name: TMPDIR value: /tmp image: curlimages/curl name: stage-in-cwl-container resources: requests: cpu: '1' memory: 1024Mi volumeMounts: - mountPath: /XiTfjy name: calrissian-tmpout readOnly: false subPath: sqh3fknm - mountPath: /tmp name: tmpdir workingDir: /XiTfjy initContainers: [] restartPolicy: Never securityContext: runAsGroup: 0 runAsUser: 1001 volumes: - name: calrissian-input-data persistentVolumeClaim: claimName: calrissian-input-data readOnly: true - name: calrissian-tmpout persistentVolumeClaim: claimName: calrissian-tmpout readOnly: false - name: calrissian-output-data persistentVolumeClaim: claimName: calrissian-output-data readOnly: false - emptyDir: {} name: tmpdir -------------------------------------------------------------------------------- Created k8s pod name stage-in-cwl-pod-ydxduxah with id f17fc3f2-b49b-4182-bfc1-379eaac5a691 PodMonitor adding stage-in-cwl-pod-ydxduxah k8s pod 'stage-in-cwl-pod-ydxduxah' started [stage-in-cwl-pod-ydxduxah] follow_logs start [stage-in-cwl-pod-ydxduxah] follow_logs end Handling terminated pod name stage-in-cwl-pod-ydxduxah with id f17fc3f2-b49b-4182-bfc1-379eaac5a691 handling completion with 0 PodMonitor removing stage-in-cwl-pod-ydxduxah shutil.rmtree(/tmp/tjb__2wk, True) shutil.rmtree(/tmp/4oavux2h, True) DEBUG restore [ram: 1024, cores: 1] to available [ram: 14976.0, cores: 7.0] DEBUG Finishing ThreadPoolExecutor.run_jobs: total_resources=[ram: 16000.0, cores: 8.0], available_resources=[ram: 16000.0, cores: 8.0] DEBUG Moving /calrissian/tmpout/sqh3fknm/LC80101172015002LGN00_BQA.TIF to /calrissian/output-data/LC80101172015002LGN00_BQA.TIF ERROR Unhandled error: [Errno 1] Operation not permitted Traceback (most recent call last): File "/usr/local/lib/python3.7/shutil.py", line 566, in move os.rename(src, real_dst) OSError: [Errno 18] Invalid cross-device link: '/calrissian/tmpout/sqh3fknm/LC80101172015002LGN00_BQA.TIF' -> '/calrissian/output-data/LC80101172015002LGN00_BQA.TIF' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/cwltool/main.py", line 1248, in main tool, initialized_job_order_object, runtimeContext, logger=_logger File "/usr/local/lib/python3.7/site-packages/cwltool/executors.py", line 60, in __call__ return self.execute(process, job_order_object, runtime_context, logger) File "/usr/local/lib/python3.7/site-packages/cwltool/executors.py", line 157, in execute path_mapper=runtime_context.path_mapper, File "/usr/local/lib/python3.7/site-packages/cwltool/process.py", line 401, in relocateOutputs stage_files(pm, stage_func=_relocate, symlink=False, fix_conflicts=True) File "/usr/local/lib/python3.7/site-packages/cwltool/process.py", line 297, in stage_files stage_func(entry.resolved, entry.target) File "/usr/local/lib/python3.7/site-packages/cwltool/process.py", line 374, in _relocate shutil.move(src, dst) File "/usr/local/lib/python3.7/shutil.py", line 580, in move copy_function(src, real_dst) File "/usr/local/lib/python3.7/shutil.py", line 267, in copy2 copystat(src, dst, follow_symlinks=follow_symlinks) File "/usr/local/lib/python3.7/shutil.py", line 206, in copystat follow_symlinks=follow) PermissionError: [Errno 1] Operation not permitted Starting Cleanup Finishing Cleanup
Any ideas on what I can try to get this working?
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Answering my own question, looks like it's because of the CIFS filesystem that mounts in the volumes doesn't allow for modification of file modes: https://docs.microsoft.com/en-us/answers/questions/89827/how-can-i-change-folder-or-file-permissions-when-m.html Will look into using NFS with AKS instead.
Sorry, something went wrong.
Successfully merging a pull request may close this issue.
I'm running into this issue running a simple stage-in CWL on AKS. I've modified the default storageClass to use AzureFile (https://pascalnaber.wordpress.com/2018/01/26/persistent-storage-and-volumes-using-kubernetes-on-azure-with-aks-or-azure-container-service/) to support
ReadWriteMany
.My CWL is:
and my K8s job YAML:
The pod's log shows:
Any ideas on what I can try to get this working?
Thanks in advance.
The text was updated successfully, but these errors were encountered: