idea: "upper layer" (erofs) inside of initramfs #332

allisonkarlitskaya · 2024-09-05T15:29:34Z

This is a really vague idea that I discussed with @cgwalters and @travier today. They both said it belongs here as an issue. At this point this is little more than a raw braindump. There's a lot to think through and discuss.

The erofs produced by mkcomposefs on a reasonably complete /usr is on the order of double digits MB. I've seen ~50MB generally, and it compresses well (down to more like 10MB). The initramfs+kernel on my Silverblue system is low triple digits (~150MB, most of which is the initramfs).

It wouldn't be completely unreasonable, then, to have a complete static copy of the composefs "upper layer" erofs image inside of the UKI. This would completely side-step quite a lot of thorny issues around binding the UKI to the correct deployment: all you'd need is the kernel image and the digest store.

How we get a UKI with this erofs inside of it could go two ways:

generate this on the end-user system by (deterministic magic) which lets us get a UKI which is bit-for-bit the same as the one we were expecting it to be. We'd have some out-of-band signature somewhere (in some metadata that doesn't become part of the image) that we could then use for signing this.
push everything to the container image creation: the kernel image would be created as the last step of the image creation process. This would involve running mkcomposefs inside of the container, on the contents of the container itself, and embedding the resulting blob into the UKI, which we'd then write to the container image at a well-known path. Any signing that we might want to do as part of creating the image could happen at this point, inside of the image (or in another build stage and copied back into the final image).

The second approach has an extremely simple deployment strategy: just extract the container .tar directly into a composefs digest store (without creating the erofs). The backing store should now contain all of the files that the erofs referred to. Install the kernel image into the EFI ESP and you're done.

The second way seems wonderfully simple until you realize that there are some very serious drawbacks there:

we're essentially creating a new container format: the metadata about which files are part of the image is stored in the .tar of the image, but now also in the erofs that we put inside of the UKI.
which means, of course, that it's no longer possible to make casual modifications to the container to add a file or install an extra package or so: you need to regenerate the kernel image. Maybe that's not so bad?

I think the second approach could be extremely nice for specific deployment scenarios, but it's a very different flavour than what has been promised for the "FROM fedora / ADD / RUN / ..." approach to OS customization.

So that takes us back to a reality where we probably want to support the first scenario of building the composefs and assembling the UKI on the end system. That needs a lot of thinking...

This also intersects with the question about what a signature from an OS vendor on a particular kernel image means. Today it's possible to have a signed kernel boot an unsigned root filesystem. Tomorrow we seem to want to go into a direction where there's additional assurances about the root filesystem contents as well, but if it remains possible to continue booting arbitrary root filesystems with a different version of the same kernel, then this promise is a whole lot less meaningful. In fact, the entire "look how easy it is to customize your system!" bootc ideal sort of depends on being able to modify the root filesystem without needing to resign the kernel... @travier mentioned that we can support both scenarios with kernel variations which produce unique PCR measurements, allowing the data partition to be encrypted by a key that will only be available if we boot a "trusted" rootfs. There are some very deep product-level decisions here...

The text was updated successfully, but these errors were encountered:

allisonkarlitskaya · 2024-09-05T15:35:03Z

One note about performance/memory trade-offs: having the erofs as part of the UKI (and then permanently stored in RAM) would mean that the entire metadata of the system partition is in RAM. ls -lR /usr would always happen without touching the disk. It's more data to load when booting the kernel image, but having that data pre-loaded as a small blob up front seems like it should probably be a net win. It would have to be measured. It also means that we have a chunk of RAM that we've "wasted"...

allisonkarlitskaya · 2024-09-05T15:39:49Z

Another requirement of the "UKI inside the OCI container" approach (and maybe the "UKI generated locally" approach as well): we'd probably want a tool that could scan the UKI to find out which blobs its refers to in the digest store. This is important for pruning the store when removing old images.

travier · 2024-09-05T16:52:37Z

One part of implementing this idea is to adapt https://github.com/ostreedev/ostree/blob/main/src/switchroot/ostree-prepare-root.c to use this EROFS instead of looking at the sysroot.

travier · 2024-09-09T08:36:01Z

Here is a potential flow where we could use that feature that would help us workaround SELinux issues and remove the need for build time commits:

Build via a Containerfile:

# "Normal" build part where you customize your image
FROM base-image as target
RUN Make changes here as needed

# Use a side image to build the composefs & UKI
FROM target as builder
RUN Rebuild SELinux policy
RUN - Do an ostree commit with the changes (i.e. we need to figure out what changed)
    - using the context from the updated SELinux policy
    - and get the full composefs EROFS for the final root
RUN Compress and append the EROFS blob to the initramfs in a pre-defined place
RUN Install ukify & Secure Boot signing tools
RUN Build a UKI with the kernel, initramfs, command line config from the container image and sign it, output to /uki

# Go back to the final image and include just the UKI
FROM target
COPY --from builder /uki /uki

Then on the final system we would do:

ostree container image pull which will import all the objects from the "target" image, including the UKI. We will just ignore the xattrs and SELinux labels.
Copy the UKI from the imported ostree commit to the ESP
Do the rename dance to get it in the right order for boot
Reboot

We tried something similar while prototyping: https://github.com/travier/fedora-coreos-uki/blob/main/fcos-uki/Containerfile

travier · 2024-09-09T08:47:32Z

The major change with this approach is that we clearly split the file content from the metadata and the container becomes a way to only transport object data plus a UKI which includes all the metadata. Thus the deployed rootfs becomes an object store only and we don't "care" about ostree commits anymore as we don't need to sign them or use them to regenerate the composefs metadata on the systems.

cgwalters · 2024-09-09T12:51:58Z

Another requirement of the "UKI inside the OCI container" approach (and maybe the "UKI generated locally" approach as well): we'd probably want a tool that could scan the UKI to find out which blobs its refers to in the digest store. This is important for pruning the store when removing old images.

Yes. Combining with this comment in general it argues for some new tooling - not too large or complex tooling but new tooling nevertheless. One option is to implement it in this repo as a build-time option - a variant of that is to implement it in Rust (also in this repo). Maybe something like a composefs-boot crate?

cgwalters · 2024-09-09T13:00:13Z

I chatted with @allisonkarlitskaya about this and there's a lot to like about the simplicity of this approach - I'm 100% on board with continuing investigation of this direction.

My biggest concern was that I'd also really like to build the story of using composefs for apps/extensions/configmaps etc. and this model reduces the alignment between those two approaches.

Combining, this issue also intersects strongly with #294 where I was trying hard to think of a way to bring OCI metadata under verity protection. Hmmm...I guess probably the simplest variant that would work for this is to require the UKI to always be in a distinct layer (with a special annotation like composefs.boot or something), and the manifest that gets included inside the image doesn't have that layer.
Also worth thinking about here is the related issue I was thinking about around how we store individual layers. We must support only fetching changed layers across upgrades.

travier · 2024-09-13T09:14:47Z

In #332 (comment), I forgot that we still need to do the 3-way merge for /etc so we still a "deployment" of it, so this is a bit more complex.

travier · 2024-09-13T11:21:52Z

We've also realized that including the composefs EROFS file in the UKI means that it is now public, thus the the file listing and metadata is public. This is not really an issue but just something to be aware of.

jbtrystram · 2024-09-13T11:32:28Z

We've also realized that including the composefs EROFS file in the UKI means that it is now public, thus the the file listing and metadata is public. This is not really an issue but just something to be aware of.

(when using LUKS on the rootfs)

cgwalters · 2024-09-13T15:33:03Z

In #332 (comment), I forgot that we still need to do the 3-way merge for /etc so we still a "deployment" of it, so this is a bit more complex.

For ostree yes, though we also support etc.transient where that wouldn't be needed.

I think in theory we could ship initramfs glue in this project such that "mount composefs from initramfs" logic could in theory be very agnostic, i.e. we have:

sysroot.mount
composefs-mount.service (replaces /sysroot with a composefs setup, with backing objects in something native like /composefs/objects maybe? But the backing store can be configured in some way (an xattr on the cfs? a config file?))
ostree-prepare-root.service (mounts /etc and /var in the way ostree does it today using the physical root, which also note the intersection with Canonical method to find backing filesystem (and block device) #280 ), but the ostree bits could obviously be replaced with something else for non-ostree consumers
initrd-root-fs.target
...
switchroot

cgwalters · 2024-09-13T15:51:01Z

We've also realized that including the composefs EROFS file in the UKI means that it is now public, thus the the file listing and metadata is public. This is not really an issue but just something to be aware of.

Instead of "public" I would say "not encrypted on disk" to be clear. "public" often implies to me "accessible to the whole Internet" but for images generated on premise and deployed to servers that are physically secured, I wouldn't say the UKIs here are "public".

That said...AFAIK there's nothing that would block someone from encrypting the erofs in the initramfs, and decrypting using e.g. a key stored in the TPM or something.

allisonkarlitskaya mentioned this issue Sep 6, 2024

all: add 'copy' mount option #334

Draft

cgwalters added area/booting Issues related to booting with composefs enhancement New feature or request labels Sep 9, 2024

This was referenced Sep 16, 2024

composefs end state "v1" goal containers/storage#2095

Open

support deploying a composefs directly ostreedev/ostree#3291

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idea: "upper layer" (erofs) inside of initramfs #332

idea: "upper layer" (erofs) inside of initramfs #332

allisonkarlitskaya commented Sep 5, 2024

allisonkarlitskaya commented Sep 5, 2024

allisonkarlitskaya commented Sep 5, 2024

travier commented Sep 5, 2024

travier commented Sep 9, 2024 •

edited

Loading

travier commented Sep 9, 2024

cgwalters commented Sep 9, 2024

cgwalters commented Sep 9, 2024

travier commented Sep 13, 2024

travier commented Sep 13, 2024

jbtrystram commented Sep 13, 2024

cgwalters commented Sep 13, 2024

cgwalters commented Sep 13, 2024

idea: "upper layer" (erofs) inside of initramfs #332

idea: "upper layer" (erofs) inside of initramfs #332

Comments

allisonkarlitskaya commented Sep 5, 2024

allisonkarlitskaya commented Sep 5, 2024

allisonkarlitskaya commented Sep 5, 2024

travier commented Sep 5, 2024

travier commented Sep 9, 2024 • edited Loading

travier commented Sep 9, 2024

cgwalters commented Sep 9, 2024

cgwalters commented Sep 9, 2024

travier commented Sep 13, 2024

travier commented Sep 13, 2024

jbtrystram commented Sep 13, 2024

cgwalters commented Sep 13, 2024

cgwalters commented Sep 13, 2024

travier commented Sep 9, 2024 •

edited

Loading