Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman invocations using different --roots cannot share an --imagestore #2257

Open
seanlafferty-ibm opened this issue Feb 13, 2025 · 10 comments · May be fixed by #2258
Open

Podman invocations using different --roots cannot share an --imagestore #2257

seanlafferty-ibm opened this issue Feb 13, 2025 · 10 comments · May be fixed by #2258
Labels

Comments

@seanlafferty-ibm
Copy link

seanlafferty-ibm commented Feb 13, 2025

Issue Description

I suspect this was never an intended feature, but our use case is CI runners. We have hundreds of replicas, each of which has podman installed. Jobs always start with container pulls (which is the slowest part of the job). If a job pulls an image, we'd like for it to be written to a shared imagestore for subsequent jobs to reuse.

I created a imgs volume on the host, which I mounted into all of the runners. Then, all the runners were given a storage.conf with the contents:

[storage]
  driver = "overlay"
  imagestore = "/home/runner/imgs"

My expectation was that any podman pull inside of a runner would read/write this shared imagestore. However, the pulls step on each other, corrupting the imagestore/root.

Steps to reproduce the issue

# pull with root1, shared imgstore (works)
podman --root /tmp/root1 --imagestore /tmp/imgs pull mirror.gcr.io/library/bash

# pull with root2, shared imgstore (doesn't work)
podman --root /tmp/root2 --imagestore /tmp/imgs pull mirror.gcr.io/library/bash

ERRO[0000] Image mirror.gcr.io/library/bash exists in local storage but may be corrupted (remove the image to resolve the issue): layer not known

Describe the results you received

Corrupted imagestore

Describe the results you expected

Non-corrupted imagestore

podman info output

host:
  arch: amd64
  buildahVersion: 1.37.6
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.12-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: c0564282e9befb7804c3642230f8e94f1b2ba9f8'
  cpuUtilization:
    idlePercent: 95.6
    systemPercent: 1.94
    userPercent: 2.47
  cpus: 2
  databaseBackend: sqlite
  distribution:
    distribution: rhel
    version: "9.4"
  eventLogger: journald
  freeLocks: 2033
  hostname: slaffy-github1.fyre.ibm.com
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.14.0-427.42.1.el9_4.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 357359616
  memTotal: 3837394944
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.1-1.el9.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.12.1
    package: netavark-1.12.2-1.el9.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.12.2
  ociRuntime:
    name: crun
    package: crun-1.16.1-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.16.1
      commit: afa829ca0122bd5e1d67f1f38e6cc348027e3c32
      rundir: /run/user/0/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240806.gee36266-2.el9.x86_64
    version: |
      pasta 0^20240806.gee36266-2.el9.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.3.1-1.el9.x86_64
    version: |-
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 17118425088
  swapTotal: 17175670784
  uptime: 453h 46m 52.00s (Approximately 18.88 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 7
    paused: 0
    running: 7
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 250059685888
  graphRootUsed: 124696444928
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1970
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.2.2
  Built: 1737721907
  BuiltTime: Fri Jan 24 04:31:47 2025
  GitCommit: ""
  GoVersion: go1.22.9 (Red Hat 1.22.9-2.el9_5)
  Os: linux
  OsArch: linux/amd64
  Version: 5.2.2

Podman in a container

Yes

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

No response

Additional information

No response

@Luap99
Copy link
Member

Luap99 commented Feb 13, 2025

I am not sure if imagestores are supposed to be shared between different podman instances, my understanding was that one uses additionalimagestores= for that. That of course is read only and would not pull new images into that.

cc @giuseppe

@baude
Copy link
Member

baude commented Feb 13, 2025

that is also my understanding ... i think https://www.redhat.com/en/blog/image-stores-podman might be what you are after?

@seanlafferty-ibm
Copy link
Author

seanlafferty-ibm commented Feb 13, 2025

I was taking a bit of a logical leap based on how additionalimagestores behaves:

  1. two podmans that do not share storage = set different roots
  2. two podmans that share a pre-populated image cache = set different roots, but the same additionalimagestores
  3. two podmans that share a read/write image cache = set different roots, but the same imagestore?

I am doubtful imagestore was designed to serve that usecase, but I couldn't find explicit documentation about it, so figured I'd ask. It would be handy since the read-only nature of additionalimagestores makes it hard to populate in environments were the images are not known ahead of time. 🙂

@seanlafferty-ibm
Copy link
Author

seanlafferty-ibm commented Feb 14, 2025

Another issue with additionalimagestores is that podman pull does not seem to consult it when determining if an image already exists. So even if we populate additionalimagestores via some back channel, the RO store doesn't provide any benefit to jobs that explicitly invoke podman pull.

Ex:

podman --root /tmp/root1 pull mirror.gcr.io/library/bash
# pull again to the same root, its cached as expected
podman --root /tmp/root1 pull mirror.gcr.io/library/bash

# use a different root, with an additionalimagestore to the first root
# we download the image even though its present in the RO additionalimagestore :(
podman --root /tmp/root2 --storage-opt additionalimagestore=/tmp/root1 pull mirror.gcr.io/library/bash

Not sure if that expected behavior. It seems quirky, but on the other hand if an image's existence in additionalimagestore short-circuited the podman pull, you wouldn't be able to download the image explicitly to your root even if you wanted to. So maybe that's an intentional design decision?

@giuseppe
Copy link
Member

I've tried with Podman 5.3.2 and it works well for me, as the issue you've seen was fixed with defaae6

Can you please try with a newer version?

@Luap99
Copy link
Member

Luap99 commented Feb 14, 2025

I used the original reproducer on main yesterday and it failed.

$ bin/podman version
Client:       Podman Engine
Version:      5.5.0-dev
API Version:  5.5.0-dev
Go Version:   go1.22.10
Git Commit:   53c9100c72db4d612e05c230df8dc2e702c5f4da
Built:        Fri Feb 14 11:26:59 2025
OS/Arch:      linux/amd64
$ bin/podman --root /tmp/root1 --imagestore /tmp/imgs pull mirror.gcr.io/library/bash
Trying to pull mirror.gcr.io/library/bash:latest...
Getting image source signatures
Copying blob 1e14325cdcb9 done   | 
Copying blob 1f3e46996e29 done   | 
Copying blob dc5efae3f74e done   | 
Copying config 2a658e2e2b done   | 
Writing manifest to image destination
2a658e2e2bab7c6a5fb70fada1e493be70faf89a941fbb5fd52b69b439781c64
$ bin/podman --root /tmp/root2 --imagestore /tmp/imgs pull mirror.gcr.io/library/bash
ERRO[0000] Image mirror.gcr.io/library/bash exists in local storage but may be corrupted (remove the image to resolve the issue): layer not known 
Trying to pull mirror.gcr.io/library/bash:latest...
Getting image source signatures
Copying blob 1f3e46996e29 [================================>-----] 3.0MiB / 3.5MiB | 3.9 MiB/s
Copying blob dc5efae3f74e [==================>-------------------] 1.3MiB / 2.6MiB | 1.6 MiB/s
Copying blob 1f3e46996e29 done   | 
Copying blob dc5efae3f74e done   | 
Copying blob 1f3e46996e29 done   | 
Copying blob dc5efae3f74e done   | 
Copying blob 1f3e46996e29 done   | 
Copying blob dc5efae3f74e done   | 
Copying blob 1e14325cdcb9 done   | 
Copying config 2a658e2e2b done   | 
Writing manifest to image destination
2a658e2e2bab7c6a5fb70fada1e493be70faf89a941fbb5fd52b69b439781c64

@giuseppe
Copy link
Member

yeah sorry, I've tested only the last sequence of commands in #2257

The issue is that we write layers.json on the root store, not sure how easy it is to fix considering the messy locking we already have

@giuseppe giuseppe transferred this issue from containers/podman Feb 14, 2025
@giuseppe
Copy link
Member

I think we need something like the following diff:

diff --git a/layers.go b/layers.go
index 69a4887bf..ed10340fa 100644
--- a/layers.go
+++ b/layers.go
@@ -51,6 +51,7 @@ type layerLocations uint8
 // unclean shutdown
 const (
 	stableLayerLocation layerLocations = 1 << iota
+	imageStoreLayerLocation
 	volatileLayerLocation
 
 	numLayerLocationIndex = iota
@@ -167,6 +168,9 @@ type Layer struct {
 	// volatileStore is true if the container is from the volatile json file
 	volatileStore bool `json:"-"`
 
+	// imageStore is true if the layer is from the image store json file
+	imageStore bool `json:"-"`
+
 	// BigDataNames is a list of names of data items that we keep for the
 	// convenience of the caller.  They can be large, and are only in
 	// memory when being read from or written to disk.
@@ -435,6 +439,9 @@ func layerLocation(l *Layer) layerLocations {
 	if l.volatileStore {
 		return volatileLayerLocation
 	}
+	if l.imageStore {
+		return imageStoreLayerLocation
+	}
 	return stableLayerLocation
 }
 
@@ -456,6 +463,7 @@ func copyLayer(l *Layer) *Layer {
 		CompressionType:    l.CompressionType,
 		ReadOnly:           l.ReadOnly,
 		volatileStore:      l.volatileStore,
+		imageStore:         l.imageStore,
 		BigDataNames:       copySlicePreferringNil(l.BigDataNames),
 		Flags:              copyMapPreferringNil(l.Flags),
 		UIDMap:             copySlicePreferringNil(l.UIDMap),
@@ -823,6 +831,9 @@ func (r *layerStore) load(lockedForWriting bool) (bool, error) {
 			if location == volatileLayerLocation {
 				layer.volatileStore = true
 			}
+			if location == imageStoreLayerLocation {
+				layer.imageStore = true
+			}
 			layers = append(layers, layer)
 			ids[layer.ID] = layer
 		}
@@ -1144,6 +1155,7 @@ func (s *store) newLayerStore(rundir, layerdir, imagedir string, driver drivers.
 		rundir:         rundir,
 		jsonPath: [numLayerLocationIndex]string{
 			filepath.Join(layerdir, "layers.json"),
+			filepath.Join(imagedir, "layers.json"),
 			filepath.Join(volatileDir, "volatile-layers.json"),
 		},
 		layerdir: layerdir,
@@ -1180,6 +1192,7 @@ func newROLayerStore(rundir string, layerdir string, driver drivers.Driver) (roL
 		mountsLockfile: nil,
 		rundir:         rundir,
 		jsonPath: [numLayerLocationIndex]string{
+			filepath.Join(layerdir, "layers.json"),
 			filepath.Join(layerdir, "layers.json"),
 			filepath.Join(layerdir, "volatile-layers.json"),
 		},
@@ -1422,6 +1435,7 @@ func (r *layerStore) create(id string, parentLayer *Layer, names []string, mount
 		GIDMap:             copySlicePreferringNil(moreOptions.GIDMap),
 		BigDataNames:       []string{},
 		volatileStore:      moreOptions.Volatile,
+		imageStore:         moreOptions.ImageStore,
 	}
 	layer.Flags[incompleteFlag] = true
 
diff --git a/store.go b/store.go
index 053c31c42..88e7e088e 100644
--- a/store.go
+++ b/store.go
@@ -667,6 +667,8 @@ type LayerOptions struct {
 	UncompressedDigest digest.Digest
 	// True is the layer info can be treated as volatile
 	Volatile bool
+	// True is the layer info must be written to the image store
+	ImageStore bool
 	// BigData is a set of items which should be stored with the layer.
 	BigData []LayerBigDataOption
 	// Flags is a set of named flags and their values to store with the layer.
@@ -1545,6 +1547,9 @@ func (s *store) putLayer(rlstore rwLayerStore, rlstores []roLayerStore, id, pare
 			GIDMap:         copySlicePreferringNil(gidMap),
 		}
 	}
+	if !writeable && s.imageStoreDir != "" {
+		options.ImageStore = true
+	}
 	return rlstore.create(id, parentLayer, names, mountLabel, nil, &options, writeable, diff, slo)
 }
 

giuseppe added a commit to giuseppe/storage that referenced this issue Feb 14, 2025
when an imagestore is used, a R/O layer must be written to the
layers.json under the imagestore, not graphroot.

The lock on the imagestore is already taken through the
multipleLockFile{} mechanism in place.

Closes: containers#2257

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe linked a pull request Feb 14, 2025 that will close this issue
@giuseppe
Copy link
Member

opened a PR: #2258

@seanlafferty-ibm
Copy link
Author

seanlafferty-ibm commented Feb 14, 2025

Thanks for looking into it! I do have a bit of a hacky workaround in the meantime that anyone else in the same boat may be interested in. I populate additionalimagestores on the fly via podman save | podman load at the end of my jobs. Obviously it's a bit resource heavy since its compressing and decompressing images redundantly. For the morbidly curious:

Workaround

Step 1. Give all the runners an additionalimagestores that's a volume (to begin with, nothing is in it)

[storage]
  driver = "overlay"
[storage.options]
  additionalimagestores=[ "/home/runner/.local/share/podman-pull-storage" ]

Step 2. On job_completion, find all images that exist, and copy them into the shared storage by finagling --root

images=$(podman images --format json | jq -c 'map(select((.Names | length > 0))) | group_by(.Id) | map({Id: .[0].Id, Names: [.[] | .Names] | flatten | unique }) | .[]')
pull_cache="/home/runner/.local/share/podman-pull-storage"
for image in $images
do
  id=$(jq -r '.Id' <<< "$image")
  names=$(jq -r '.Names | join(" ")' <<< "$image")
  echo "attempting to cache image with id: $id"
  if podman --root $pull_cache image exists $id; then
    echo "  image already exists in the cache"
    for name in $names
    do
      if podman --root $pull_cache image inspect $id | jq --exit-status --arg imageName $name 'map(select(.RepoTags[] | contains($imageName))) | length == 0' > /dev/null; then
        echo "    adding missing tag to cache: $name"
        podman --root $pull_cache tag $id $name
      fi
    done
  else
    echo "  image not found in cache, writing it with tags: $names"
    podman save $id $names | podman --root $pull_cache load
  fi
done

Step 3. On subsequent runs, images will be pulled from the additionalimagestores if they exist (assuming v5.3.2)

giuseppe added a commit to giuseppe/storage that referenced this issue Feb 14, 2025
when an imagestore is used, a R/O layer must be written to the
layers.json under the imagestore, not graphroot.

The lock on the imagestore is already taken through the
multipleLockFile{} mechanism in place.

Closes: containers#2257

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Feb 14, 2025
when an imagestore is used, a R/O layer must be written to the
layers.json under the imagestore, not graphroot.

The lock on the imagestore is already taken through the
multipleLockFile{} mechanism in place.

Closes: containers#2257

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Feb 14, 2025
when an imagestore is used, a R/O layer must be written to the
layers.json under the imagestore, not graphroot.

The lock on the imagestore is already taken through the
multipleLockFile{} mechanism in place.

Closes: containers#2257

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this issue Feb 14, 2025
when an imagestore is used, a R/O layer must be written to the
layers.json under the imagestore, not graphroot.

The lock on the imagestore is already taken through the
multipleLockFile{} mechanism in place.

Closes: containers#2257

Signed-off-by: Giuseppe Scrivano <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants