Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new disk partitioning scheme #478

Merged
merged 1 commit into from
Aug 2, 2024

Conversation

unbel13ver
Copy link
Contributor

@unbel13ver unbel13ver commented Feb 16, 2024

New disk configuration provides grounds for upcoming features, such as AB software updates and Storage VM and many more.

The Lenovo X1 config has two LVM pools, first one is fixed-size 250G "system" partition (which is going to be encrypted) and the rest of the disk is dedicated to the StorageVM.

Description of changes

Checklist for things done

  • Summary of the proposed changes in the PR description
  • More detailed description in the commit message(s)
  • Commits are squashed into relevant entities - avoid a lot of minimal dev time commits in the PR
  • Contribution guidelines followed
  • Ghaf documentation updated with the commit - https://tiiuae.github.io/ghaf/
  • PR linked to architecture documentation and requirement(s) (ticket id)
  • Test procedure described (or includes tests). Select one or more:
    • Tested on Lenovo X1 x86_64
    • Tested on Jetson Orin NX or AGX aarch64
    • Tested on Polarfire riscv64
  • Author has run nix flake check --accept-flake-config and it passes
  • All automatic Github Action checks pass - see actions
  • Author has added reviewers and removed PR draft status

Testing

Current upstream partition scheme is as follows:

[ghaf@ghaf-host:~]$ lsblk
NAME          MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda             8:0    0 465.8G  0 disk 
├─sda1          8:1    0   236M  0 part 
└─sda2          8:2    0   8.6G  0 part 
nvme0n1       259:0    0 953.9G  0 disk 
├─nvme0n1p1   259:1    0     1M  0 part 
├─nvme0n1p2   259:2    0   500M  0 part /boot
└─nvme0n1p3   259:3    0 953.4G  0 part 
  └─pool-root 254:0    0 953.4G  0 lvm  /nix/store
                                        /

To test new partitioning scheme, run lenovo-x1-carbon-gen11-debug-installer image and install Ghaf into the internal SSD storage of Lenovo-X1.
After insallation is completed and the laptop is booted into the Ghaf system, check partitions with lsblk command.
It should be as follows:

[ghaf@ghaf-host:~]$ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
nvme0n1     259:0    0 953.9G  0 disk 
├─nvme0n1p1 259:1    0     1M  0 part 
├─nvme0n1p2 259:2    0   500M  0 part /boot
├─nvme0n1p3 259:3    0   500M  0 part 
└─nvme0n1p4 259:4    0    14G  0 part 

Check zpool is online:

[ghaf@ghaf-host:~]$ zpool list
NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zroot_1    14G  5.00G  9.00G        -         -     0%    35%  1.00x    ONLINE  -

Check datasets are mounted successfully:

[ghaf@ghaf-host:~]$ mount | grep zroot_1
zroot_1/root_a on / type zfs (rw,relatime,xattr,posixacl,casesensitive,x-initrd.mount)
zroot_1/root_a on /nix/store type zfs (ro,relatime,xattr,posixacl,casesensitive)
zroot_1/storagevm on /storagevm type zfs (rw,relatime,xattr,posixacl,casesensitive)
zroot_1/vm_storage_a on /vm_storage type zfs (rw,relatime,xattr,posixacl,casesensitive)
zroot_1/gp_storage on /gp_storage type zfs (rw,relatime,xattr,posixacl,casesensitive)

Optionally - check there are no failed systemd units:

[ghaf@ghaf-host:~]$ systemctl list-units --all --state=failed
  UNIT LOAD ACTIVE SUB DESCRIPTION

0 loaded units listed.
To show all installed unit files use 'systemctl list-unit-files'.

Copy link
Contributor

@vilvo vilvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the detailed comments.

@leivos-unikie
Copy link
Contributor

@unbel13ver can you provide test instructions for this PR? What is the expected outcome of new partitioning scheme?

@unbel13ver
Copy link
Contributor Author

@unbel13ver can you provide test instructions for this PR? What is the expected outcome of new partitioning scheme?

Testing instructions added.

Copy link
Contributor

@leivos-unikie leivos-unikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • lsblk gives expected output
  • ci-test-automation passed
  • performance ok
  • apps launch

@leivos-unikie leivos-unikie added the Tested on Lenovo X1 Carbon This PR has been tested on Lenovo X1 Carbon label Feb 20, 2024
@unbel13ver unbel13ver temporarily deployed to internal-build-workflow February 21, 2024 06:40 — with GitHub Actions Inactive
@unbel13ver unbel13ver temporarily deployed to internal-build-workflow February 21, 2024 15:00 — with GitHub Actions Inactive
@unbel13ver unbel13ver temporarily deployed to internal-build-workflow February 22, 2024 12:37 — with GitHub Actions Inactive
@unbel13ver unbel13ver removed the Tested on Lenovo X1 Carbon This PR has been tested on Lenovo X1 Carbon label Feb 22, 2024
@leivos-unikie leivos-unikie added the Tested on Lenovo X1 Carbon This PR has been tested on Lenovo X1 Carbon label Feb 23, 2024
@unbel13ver unbel13ver temporarily deployed to internal-build-workflow July 1, 2024 11:26 — with GitHub Actions Inactive
@unbel13ver unbel13ver temporarily deployed to internal-build-workflow July 2, 2024 09:45 — with GitHub Actions Inactive
modules/hardware/laptop.nix Outdated Show resolved Hide resolved
@unbel13ver unbel13ver temporarily deployed to internal-build-workflow July 4, 2024 11:43 — with GitHub Actions Inactive
@clayhill66
Copy link
Collaborator

With the latest commit 9d7ab74:
-automated cases are ok
-all apps launch - ok
-nixos rebuild does NOT work

@clayhill66 clayhill66 removed the Tested on Lenovo X1 Carbon This PR has been tested on Lenovo X1 Carbon label Jul 9, 2024
@johannarautanen johannarautanen added the bug on Lenovo X1 Carbon Issues found on Lenovo X1 Carbon while checking this PR label Jul 10, 2024
@johannarautanen
Copy link

Screenshot from 2024-07-10 06-10-01

@vunnyso
Copy link
Contributor

vunnyso commented Jul 11, 2024

Reason for limited size is probably because of disko-basic-postboot commands didn't work as expected due to change in partitioning scheme.

@leivos-unikie
Copy link
Contributor

Any idea if this PR can address this problem?
Current mainline ghaf boots partially from USB SSD and nvme if ghaf is installed (with ghaf-installer) to nvme and trying to boot ghaf from USB SSD.

IMG_2866

@unbel13ver
Copy link
Contributor Author

With the latest commit 9d7ab74: -automated cases are ok -all apps launch - ok -nixos rebuild does NOT work

nixos-rebuild does not work for updating partition scheme, in this case the system needs to be re-installed.

Reason for limited size is probably because of disko-basic-postboot commands didn't work as expected due to change in partitioning scheme.

Yes, thanks for noticing this. The disco-basic-postboot.nix does not consider ZFS as a possible filesystem. I need to take a look into that.

@unbel13ver
Copy link
Contributor Author

Any idea if this PR can address this problem? Current mainline ghaf boots partially from USB SSD and nvme if ghaf is installed (with ghaf-installer) to nvme and trying to boot ghaf from USB SSD.

IMG_2866

I am not sure the issue is related to this PR. NixOS picks the device to boot by its label, and in this case there are two devices with the same label (nixos). We got the same issue with Orin devices previously. The solution either to remove external drive or change the label of the internal partition.

@leivos-unikie
Copy link
Contributor

Any idea if this PR can address this problem? Current mainline ghaf boots partially from USB SSD and nvme if ghaf is installed (with ghaf-installer) to nvme and trying to boot ghaf from USB SSD.
IMG_2866

I am not sure the issue is related to this PR. NixOS picks the device to boot by its label, and in this case there are two devices with the same label (nixos). We got the same issue with Orin devices previously. The solution either to remove external drive or change the label of the internal partition.

Actually nowadays 'LABEL' is empty for every partition, they have 'PARTLABEL' though.

I made a test, changed the partlabels of partitions on nvme with gdisk (nix-shell -p gptfdisk). Then tried to boot ghaf from USB SSD. Still it booted partially from USB and nvme.

image

So it seems that the boot is not by LABEL nor by PARTLABEL.

After installing ghaf to nvme the only way for booting ghaf from USB SSD is to boot first non-ghaf OS from USB SSD and wipe nvme.

Is it ok to discuss this here, or should we move elsewhere...

@leivos-unikie
Copy link
Contributor

leivos-unikie commented Jul 16, 2024

So it seems that the boot is not by LABEL nor by PARTLABEL.

Well, not quite so. If there is no disk-disk1-ESP partlabel available on any device the boot fails with timeout (not finding /dev/disk/by-partlabel/disk-disk1-ESP). Other partlabels seem to be insignificant, for example:

image

New disk configuration provides grounds for upcoming features,
such as AB software updates and Storage VM and many more.

Signed-off-by: Ivan Nikolaenko <[email protected]>
@unbel13ver unbel13ver temporarily deployed to internal-build-workflow July 30, 2024 13:55 — with GitHub Actions Inactive
@unbel13ver
Copy link
Contributor Author

unbel13ver commented Jul 30, 2024

ZFS partition is now extended during the first boot.
This example is when the system is booted from the external SSD (root is on the /dev/sda):

[ghaf@ghaf-host:~]$ lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda       8:0    0 465.8G  0 disk 
├─sda1    8:1    0     1M  0 part 
├─sda2    8:2    0   500M  0 part /boot
├─sda3    8:3    0   500M  0 part 
└─sda4    8:4    0 464.8G  0 part 
nvme0n1 259:0    0 953.9G  0 disk 

[ghaf@ghaf-host:~]$ zpool list
NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zfspool   464G  5.04G   459G        -         -     0%     1%  1.00x    ONLINE  -

I have also renamed zfs pool from zroot_1 to zfspool since there is only one zfs pool in use.

@unbel13ver unbel13ver added the Needs Testing CI Team to pre-verify label Jul 30, 2024
@milva-unikie
Copy link

Tested on Lenovo-X1

Everything seems to be good with both installer and debug-image! (I did not test nixos-rebuild though, please let me know if that needs to be still checked)

  • Installer installed to internal SSD

    • Installation works
    • Test-automation passes
    • The iso-image is a bit bigger than it was previously (8,5 GB now vs 7,3 GB previously)
    • Same testing results as in Testing-section except that zroot_1 has been renamed to zfspool
  • Debug-image installed to external SSD

    • Test-automation passes
    • disk1.raw.zst is a bit bigger than it was previously (4,5 GB now vs 3,8 GB previously), but the extracted image is the same size (16,1 GB)

(As a note for our testing team: The free space in host is reduced with this PR and there is no longer an excessive amount of free space to run fileio-tests. We need to decide how we want to proceed with that.)

@milva-unikie milva-unikie added Tested on Lenovo X1 Carbon This PR has been tested on Lenovo X1 Carbon and removed Needs Testing CI Team to pre-verify bug on Lenovo X1 Carbon Issues found on Lenovo X1 Carbon while checking this PR labels Aug 1, 2024
@brianmcgillion brianmcgillion merged commit 5fd6251 into tiiuae:main Aug 2, 2024
14 checks passed
@vunnyso vunnyso mentioned this pull request Aug 8, 2024
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tested on Lenovo X1 Carbon This PR has been tested on Lenovo X1 Carbon
Projects
None yet
Development

Successfully merging this pull request may close these issues.