Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid cross-device link running on Azure Kubernetes Service #123

Closed
pymonger opened this issue Oct 13, 2021 · 1 comment · May be fixed by common-workflow-language/cwltool#1544
Closed

Comments

@pymonger
Copy link

I'm running into this issue running a simple stage-in CWL on AKS. I've modified the default storageClass to use AzureFile (https://pascalnaber.wordpress.com/2018/01/26/persistent-storage-and-volumes-using-kubernetes-on-azure-with-aks-or-azure-container-service/) to support ReadWriteMany.

My CWL is:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
hints:
  DockerRequirement:
    dockerPull: curlimages/curl
baseCommand: [curl]
inputs:
  input_url:
    type: string
    inputBinding:
      prefix: -O
  input_file:
    type: string
outputs:
  localized_file:
    type: File
    outputBinding:
      glob: $(inputs.input_file)
  stdout_file:
    type: stdout
  stderr_file:
    type: stderr
stdout: stdout_stage-in.txt
stderr: stderr_stage-in.txt

and my K8s job YAML:

---
apiVersion: batch/v1
kind: Job
metadata:
  name: calrissian-job
spec:
  template:
    spec:
      containers:
      - name: calrissian-job
        image: pymonger/calrissian:latest
        envFrom:
          - secretRef:
              name: aws-creds
        command: ["calrissian"]
        args:
          - "--debug"
          - "--stdout"
          - "/calrissian/output-data/docker-output.json"
          - "--stderr"
          - "/calrissian/output-data/docker-stderr.log"
          - "--max-ram"
          - "16G"
          - "--max-cores"
          - "8"
          - "--tmp-outdir-prefix"
          - "/calrissian/tmpout/"
          - "--outdir"
          - "/calrissian/output-data/"
          - "--usage-report"
          - "/calrissian/output-data/docker-usage.json"
          - "https://raw.githubusercontent.com/pymonger/soamc-cwl-demo/develop/baseline-pge/stage-in.cwl"
          - "--input_url"
          - "https://s3-us-west-2.amazonaws.com/landsat-pds/L8/010/117/LC80101172015002LGN00/LC80101172015002LGN00_BQA.TIF"
          - "--input_file"
          - "LC80101172015002LGN00_BQA.TIF"
        volumeMounts:
        - mountPath: /calrissian/input-data
          name: calrissian-input-data
          readOnly: true
        - mountPath: /calrissian/tmpout
          name: calrissian-tmpout
        - mountPath: /calrissian/output-data
          name: calrissian-output-data
        env:
        - name: CALRISSIAN_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
      restartPolicy: Never
      volumes:
      - name: calrissian-input-data
        persistentVolumeClaim:
          claimName: calrissian-input-data
          readOnly: true
      - name: calrissian-tmpout
        persistentVolumeClaim:
          claimName: calrissian-tmpout
      - name: calrissian-output-data
        persistentVolumeClaim:
          claimName: calrissian-output-data

The pod's log shows:

INFO calrissian 0.10.0 (cwltool 3.1.20211004060744)
DEBUG Parsed job order from command line: {
    "id": "https://raw.githubusercontent.com/pymonger/soamc-cwl-demo/develop/baseline-pge/stage-in.cwl",
    "input_file": "LC80101172015002LGN00_BQA.TIF",
    "input_url": "https://s3-us-west-2.amazonaws.com/landsat-pds/L8/010/117/LC80101172015002LGN00/LC80101172015002LGN00_BQA.TIF"
}
DEBUG Starting ThreadPoolJobExecutor.run_jobs: total_resources=[ram: 16000.0, cores: 8.0], max_workers=None
DEBUG [job stage-in.cwl] initializing from https://raw.githubusercontent.com/pymonger/soamc-cwl-demo/develop/baseline-pge/stage-in.cwl
DEBUG [job stage-in.cwl] {
    "input_file": "LC80101172015002LGN00_BQA.TIF",
    "input_url": "https://s3-us-west-2.amazonaws.com/landsat-pds/L8/010/117/LC80101172015002LGN00/LC80101172015002LGN00_BQA.TIF"
}
DEBUG [job stage-in.cwl] path mappings is {}
DEBUG [job stage-in.cwl] command line bindings is [
    {
        "position": [
            -1000000,
            0
        ],
        "datum": "curl"
    },
    {
        "prefix": "-O",
        "position": [
            0,
            "input_url"
        ],
        "datum": "https://s3-us-west-2.amazonaws.com/landsat-pds/L8/010/117/LC80101172015002LGN00/LC80101172015002LGN00_BQA.TIF"
    }
]
DEBUG wait_for_completion with 0 futures
DEBUG wait_for_completion with 0 futures
DEBUG allocate [ram: 1024, cores: 1] from available [ram: 16000.0, cores: 8.0]
DEBUG wait_for_completion with 1 futures
DEBUG [job stage-in.cwl] initial work dir {}
Building resources spec from {'cores': 1, 'ram': 1024}
--------------------------------------------------------------------------------
apiVersion: v1
kind: Pod
metadata:
  labels: {}
  name: stage-in-cwl-pod-ydxduxah
spec:
  containers:
  - args:
    - curl -O https://s3-us-west-2.amazonaws.com/landsat-pds/L8/010/117/LC80101172015002LGN00/LC80101172015002LGN00_BQA.TIF
      > stdout_stage-in.txt 2> stderr_stage-in.txt
    command:
    - /bin/sh
    - -c
    env:
    - name: HOME
      value: /XiTfjy
    - name: TMPDIR
      value: /tmp
    image: curlimages/curl
    name: stage-in-cwl-container
    resources:
      requests:
        cpu: '1'
        memory: 1024Mi
    volumeMounts:
    - mountPath: /XiTfjy
      name: calrissian-tmpout
      readOnly: false
      subPath: sqh3fknm
    - mountPath: /tmp
      name: tmpdir
    workingDir: /XiTfjy
  initContainers: []
  restartPolicy: Never
  securityContext:
    runAsGroup: 0
    runAsUser: 1001
  volumes:
  - name: calrissian-input-data
    persistentVolumeClaim:
      claimName: calrissian-input-data
      readOnly: true
  - name: calrissian-tmpout
    persistentVolumeClaim:
      claimName: calrissian-tmpout
      readOnly: false
  - name: calrissian-output-data
    persistentVolumeClaim:
      claimName: calrissian-output-data
      readOnly: false
  - emptyDir: {}
    name: tmpdir
--------------------------------------------------------------------------------

Created k8s pod name stage-in-cwl-pod-ydxduxah with id f17fc3f2-b49b-4182-bfc1-379eaac5a691
PodMonitor adding stage-in-cwl-pod-ydxduxah
k8s pod 'stage-in-cwl-pod-ydxduxah' started
[stage-in-cwl-pod-ydxduxah] follow_logs start
[stage-in-cwl-pod-ydxduxah] follow_logs end
Handling terminated pod name stage-in-cwl-pod-ydxduxah with id f17fc3f2-b49b-4182-bfc1-379eaac5a691
handling completion with 0
PodMonitor removing stage-in-cwl-pod-ydxduxah
shutil.rmtree(/tmp/tjb__2wk, True)
shutil.rmtree(/tmp/4oavux2h, True)
DEBUG restore [ram: 1024, cores: 1] to available [ram: 14976.0, cores: 7.0]
DEBUG Finishing ThreadPoolExecutor.run_jobs: total_resources=[ram: 16000.0, cores: 8.0], available_resources=[ram: 16000.0, cores: 8.0]
DEBUG Moving /calrissian/tmpout/sqh3fknm/LC80101172015002LGN00_BQA.TIF to /calrissian/output-data/LC80101172015002LGN00_BQA.TIF
ERROR Unhandled error:
  [Errno 1] Operation not permitted
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/shutil.py", line 566, in move
    os.rename(src, real_dst)
OSError: [Errno 18] Invalid cross-device link: '/calrissian/tmpout/sqh3fknm/LC80101172015002LGN00_BQA.TIF' -> '/calrissian/output-data/LC80101172015002LGN00_BQA.TIF'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/cwltool/main.py", line 1248, in main
    tool, initialized_job_order_object, runtimeContext, logger=_logger
  File "/usr/local/lib/python3.7/site-packages/cwltool/executors.py", line 60, in __call__
    return self.execute(process, job_order_object, runtime_context, logger)
  File "/usr/local/lib/python3.7/site-packages/cwltool/executors.py", line 157, in execute
    path_mapper=runtime_context.path_mapper,
  File "/usr/local/lib/python3.7/site-packages/cwltool/process.py", line 401, in relocateOutputs
    stage_files(pm, stage_func=_relocate, symlink=False, fix_conflicts=True)
  File "/usr/local/lib/python3.7/site-packages/cwltool/process.py", line 297, in stage_files
    stage_func(entry.resolved, entry.target)
  File "/usr/local/lib/python3.7/site-packages/cwltool/process.py", line 374, in _relocate
    shutil.move(src, dst)
  File "/usr/local/lib/python3.7/shutil.py", line 580, in move
    copy_function(src, real_dst)
  File "/usr/local/lib/python3.7/shutil.py", line 267, in copy2
    copystat(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/local/lib/python3.7/shutil.py", line 206, in copystat
    follow_symlinks=follow)
PermissionError: [Errno 1] Operation not permitted
Starting Cleanup
Finishing Cleanup

Any ideas on what I can try to get this working?

Thanks in advance.

@pymonger
Copy link
Author

Answering my own question, looks like it's because of the CIFS filesystem that mounts in the volumes doesn't allow for modification of file modes: https://docs.microsoft.com/en-us/answers/questions/89827/how-can-i-change-folder-or-file-permissions-when-m.html
Will look into using NFS with AKS instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant