Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested layout for sysroot for bootc integration #38

Open
travier opened this issue Nov 20, 2024 · 4 comments
Open

Suggested layout for sysroot for bootc integration #38

travier opened this issue Nov 20, 2024 · 4 comments

Comments

@travier
Copy link
Member

travier commented Nov 20, 2024

Talking with Allison, when flushed the following potential design for the sysroot layout which would allow us to transition existing ostree based bootc systems to pure composefs ones:

.
├── composefs
│   ├── images
│   ├── objects
│   └── streams
├── ostree
│   └── deploy
│       └── fedora
│           ├── deploy
│           │   └── hash.0
│           └── var
└── state
    ├── deploy
    │   ├── composefshash0
    │   │   ├── etc
    │   │   └── var -> ../../../ostree/deploy/fedora/var
    │   └── composefshash1
    │       ├── etc
    │       └── var -> ../../os/fedora/var
    └── os
        └── fedora
            └── var
@allisonkarlitskaya
Copy link
Collaborator

Notes

  • we're not sure if /state should be toplevel or if it should live in /composefs/. This is considered an open question.
  • we normally want the composefs ID alone to select the state as well. Things get complicated with UKIs if we have the ability to have multiple sets of state that should be accessible from a single bootable image.
    • but, for unusual situations where we can modify the kernel commandline, we might consider a state= argument that explicitly selects a different state directory
  • we assume that a given composefs image will correspond to a single OS (in the ostree stateroot sense). That lets us consolidate all deployments in a single namespace.
  • on boot, we open the /state/ directory that corresponds to the composefs= argument (or maybe state= argument). Inside of that directory might exist etc and var either as directories or symlinks. If they are directories, we mount them. If they are symlinks, we mount what they point at. That supports flexibility over different scenarios:
    • the usual situation is that we want to share one /var per OS. The diagram above illustrates how that would look (/state/os/fedora, with symlinks from each deployment pointing to it eg. composefshash1)
    • sharing a /var with a sysroot from ostree, very useful for migrating or sharing (eg. composefshash0)
    • per-deployment /var (possibly created using btrfs snapshots on the sysroot filesystem directly)
    • no /var: if /var is mounted from a separate partition using /etc/fstab or GPT or otherwise by systemd
    • we could even mount the entire /sysroot itself as /var if we wanted
  • the main point here is that this is decided/managed by deployment code. As the system is booting it just looks in the state directory and does what it's told.

About /etc

In theory /etc is handled in exactly the same way as /var during boot. In practice, we expect there to be really only one way this is done at deployment: the usual three-way merge as currently performed by ostree, and in a very similar way. We can support migrating from an existing ostree system in this way, since this is inherently a copy in any case. @travier thinks that we should rewrite this algorithm ("it's not so hard...") in Rust as part of bootc, and that seems reasonable to me — it also sounds like he was volunteering me for the job.

To support the three-way merge, we plan to move /etc to /usr/etc in the in-memory filesystem tree during creation of the composefs, creating an empty /etc in its place — similar to what bootc does now when importing to ostree. That will happen probably just after the SELinux relabelling process. This is another way that "bootable" composefs will vary wrt. composefs that we create for running containers. See #35.

I'm not sure what the merge process will look like exactly. Obviously we'll read the existing /etc from the filesystem and also write the new merged /etc to the filesystem, but how will we find the new /usr/etc from the new system image? There are two main options:

  • access them via the in-memory filesystem API
  • mount the composefs of the to-be-booted OS image and access them that way

Note: depending on how we approach this, it might be possible to avoid having /usr/etc. For example, we could leave it at /etc and mount over top of it at run time (but still see it during deploy time) or we could exclude /etc/ entirely from the created image, replacing it with just the empty directory (since we'd still have access to the files via the in-memory filesystem image).

About kernels/bootloader entries

This might be difficult because apparently ostree doesn't like to share access to /boot. This is the part that we fleshed out the least. Probably we need bootc to mediate here.

@allisonkarlitskaya
Copy link
Collaborator

I'm starting to look into merge_configuration_from() in libostree.

@cgwalters
Copy link
Collaborator

Related to merging /etc see also uapi-group/specifications#76 (comment) - I've been meaning to look at the Flatcar use of overlayfs here.

@allisonkarlitskaya
Copy link
Collaborator

Related to merging /etc see also uapi-group/specifications#76 (comment) - I've been meaning to look at the Flatcar use of overlayfs here.

I've read this comment. It's a good sort of "state of the art" overview. I think it would indeed be worth looking into the flatcar approach a bit and seeing if we can use that instead of writing our own merge code. I hadn't considered that ostree is maybe the way that it is only because overlayfs wasn't around at the time...

In any case, whatever we spec out, it ought to accommodate this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants