Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set device to maintenance mode if TPM error is detected #4462

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

shjala
Copy link
Member

@shjala shjala commented Dec 5, 2024

We have seen errors like "point is not on the required curve" in the past that prevents the device from upgrading because TPM fails to perform an ECDH key agreement to generate the KEK and decrypt the vault key. To catch these errors early, check if TPM is not functioning properly on a critical path and set the device to maintenance mode.

@shjala shjala marked this pull request as draft December 5, 2024 14:06
@github-actions github-actions bot requested a review from uncleDecart December 5, 2024 14:07
@shjala shjala force-pushed the maintenance.mode.tpm.err branch from a31d633 to 4b184ce Compare December 11, 2024 10:16
@shjala shjala force-pushed the maintenance.mode.tpm.err branch 2 times, most recently from a4ed054 to 98aab97 Compare December 12, 2024 15:09
@github-actions github-actions bot requested a review from rucoder December 12, 2024 15:09
@shjala shjala force-pushed the maintenance.mode.tpm.err branch from 98aab97 to ed07c1d Compare December 12, 2024 17:32
@shjala
Copy link
Member Author

shjala commented Dec 12, 2024

For now it looks like this :
image

@shjala
Copy link
Member Author

shjala commented Jan 2, 2025

What remained :

@shjala shjala force-pushed the maintenance.mode.tpm.err branch from 4515380 to 6280323 Compare January 3, 2025 09:51
shjala added 2 commits January 3, 2025 12:02
Update EVE-API to include TPM error in maintenance mode status.

Signed-off-by: Shahriyar Jalayeri <[email protected]>
This change makes eve rely on the API definition of MaintenanceModeReason
instead of the other way around.

Signed-off-by: Shahriyar Jalayeri <[email protected]>
@shjala shjala force-pushed the maintenance.mode.tpm.err branch from 6280323 to 9ec1a4e Compare January 3, 2025 10:12
@shjala
Copy link
Member Author

shjala commented Jan 3, 2025

Updating EVE-API updates protobuf as a side effect :

pillar git:(maintenance.mode.tpm.err) ✗ go get -u github.com/lf-edge/eve-api/go/info
go: upgraded github.com/lf-edge/eve-api/go v0.0.0-20241213165007-1a8f9be485b1 => v0.0.0-20250102213900-786246223024
go: upgraded google.golang.org/protobuf v1.33.0 => v1.36.1

Copy link

codecov bot commented Jan 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 20.90%. Comparing base (2e69dc3) to head (9ec1a4e).
Report is 25 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4462   +/-   ##
=======================================
  Coverage   20.90%   20.90%           
=======================================
  Files          13       13           
  Lines        2894     2894           
=======================================
  Hits          605      605           
  Misses       2163     2163           
  Partials      126      126           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shjala shjala force-pushed the maintenance.mode.tpm.err branch from 9ec1a4e to 0a8a65d Compare January 3, 2025 10:31
@shjala shjala marked this pull request as ready for review January 3, 2025 10:31
@shjala
Copy link
Member Author

shjala commented Jan 3, 2025

@eriknordmark for fault injection I exprimented with SWTPM https://github.com/shjala/swtpm-fault-injection, but this won't work because Qemu is using control socket of SWTPM and not the TPM socket, so I have to do some reverse engineering to figure out how to inject faults there (work in progress...).

@OhmSpectator
Copy link
Member

@shjala, could you please take a look at Erik's comments? =)

@shjala
Copy link
Member Author

shjala commented Jan 21, 2025

@shjala, could you please take a look at Erik's comments? =)

I'm aware, I'm doing some preliminary work to address the comments.

@shjala shjala force-pushed the maintenance.mode.tpm.err branch from 0a8a65d to 2c2d305 Compare January 21, 2025 16:24
@github-actions github-actions bot requested a review from eriknordmark January 21, 2025 16:25
@shjala
Copy link
Member Author

shjala commented Jan 21, 2025

@eriknordmark Testing still is a bit manual, but works, I'll create some scripts to automate it and later add it to Eden:

➜  swtpm-fault-injection git:(main) ✗ git clone [email protected]:shjala/swtpm-fault-injection.git
➜  swtpm-fault-injection git:(main) ✗ cd swtpm-fault-injection
➜  swtpm-fault-injection git:(main) ✗ which swtpm             
/usr/local/bin/swtpm
➜  swtpm-fault-injection git:(main) ✗ ldd /usr/local/bin/swtpm
        linux-vdso.so.1 (0x00007ffc6e38c000)
        libswtpm_libtpms.so.0 => /usr/local/lib/swtpm/libswtpm_libtpms.so.0 (0x00007b4d65723000)
        libglib-2.0.so.0 => /lib/x86_64-linux-gnu/libglib-2.0.so.0 (0x00007b4d655ce000)
        libtpms.so.0 => /usr/local/lib/libtpms.so.0 (0x00007b4d6549c000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007b4d65200000)
        libjson-glib-1.0.so.0 => /lib/x86_64-linux-gnu/libjson-glib-1.0.so.0 (0x00007b4d6546f000)
        libgobject-2.0.so.0 => /lib/x86_64-linux-gnu/libgobject-2.0.so.0 (0x00007b4d651a0000)
        libseccomp.so.2 => /lib/x86_64-linux-gnu/libseccomp.so.2 (0x00007b4d6544d000)
        libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007b4d64c00000)
        libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007b4d6512a000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007b4d64b19000)
        /lib64/ld-linux-x86-64.so.2 (0x00007b4d65758000)
        libgio-2.0.so.0 => /lib/x86_64-linux-gnu/libgio-2.0.so.0 (0x00007b4d6493f000)
        libffi.so.8 => /lib/x86_64-linux-gnu/libffi.so.8 (0x00007b4d6543e000)
        libgmodule-2.0.so.0 => /lib/x86_64-linux-gnu/libgmodule-2.0.so.0 (0x00007b4d65437000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007b4d6510e000)
        libmount.so.1 => /lib/x86_64-linux-gnu/libmount.so.1 (0x00007b4d650ca000)
        libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007b4d6509e000)
        libblkid.so.1 => /lib/x86_64-linux-gnu/libblkid.so.1 (0x00007b4d65067000)
        libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007b4d648a8000)
➜  swtpm-fault-injection git:(main) ✗ ./findoffset.sh /usr/local/lib/swtpm/libswtpm_libtpms.so.0 SWTPM_IO_Read
Function: SWTPM_IO_Read
Start Address:    0x0000000000013e84
End Address:    0x0000000000013f67
End Offset for uretprobe: 227 bytes
➜  swtpm-fault-injection git:(main) ✗ sudo ./main -swtpm -pid <SWTPM_PID> -libtpms "/usr/local/lib/swtpm/libswtpm_libtpms.so.0" -swtpm-io-read-offset 227 -inputs="CmdECDHZGen:RCFailure:3m:false"

@shjala
Copy link
Member Author

shjala commented Jan 22, 2025

@eriknordmark Now device can be in Maintenance Mode for multiple reasons, I'm testing this by simulating multiple reasons and see if it works as expected. I suspect controller has to change too?

@shjala shjala force-pushed the maintenance.mode.tpm.err branch 2 times, most recently from 965cffb to b49e579 Compare January 22, 2025 14:33
@shjala
Copy link
Member Author

shjala commented Jan 22, 2025

When I simulate multiple maintenance mode reasons, mergeMaintenanceMode() somehow gets called and incorrectly sets the mode to false with log Changed maintenanceMode to false, with reason as MaintenanceModeReasonTpmEncFailure !

@shjala shjala force-pushed the maintenance.mode.tpm.err branch from b49e579 to b2e4070 Compare January 22, 2025 16:12
We have seen errors like "point is not on the required curve" in the
past that prevents the device from upgrading because TPM fails to perform
an ECDH key agreement to generate the KEK and decrypt the vault key.
To catch these errors early, check if TPM is not functioning properly on
a critical path and set the device to maintenance mode.

Signed-off-by: Shahriyar Jalayeri <[email protected]>
Instead of checking TPM sanity only once at startup, check it periodically
this allows us to detect TPM errors that may occur after the initial check,
and clear TPM errors if they are transient.

Signed-off-by: Shahriyar Jalayeri <[email protected]>
This change allows the device to have multiple reasons for being in
maintenance mode. This is useful when multiple conditions are met that
require the device to be in maintenance mode. For example, if the TPM
is in error and the device disk is full, the device can be in
maintenance mode for both reasons. Clearing one of the reasons will not
take the device out of maintenance mode if there are other reasons for
it to be in maintenance mode.

Signed-off-by: Shahriyar Jalayeri <[email protected]>
@shjala shjala force-pushed the maintenance.mode.tpm.err branch from b2e4070 to 0c1204e Compare January 22, 2025 16:17
}
if (mmr & MaintenanceModeReasonUserRequested) == MaintenanceModeReasonUserRequested {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the MaintModeReason* are values 1,2,3,4 thus you can't use them as bits in a bit map.
If you set 1|2 you end up with the 3 in the field and String() reports as 1, 2, and 3 were set.

Thus to handle multiple we either need to change the enum values in the API to be bits (1<<0, 1<<1, etc), which would an incompatible value change for MaintenanceModeReasonNoDiskSpace (value would change from 3 to 1<<2 = 4), or add some "additionalMaintenceModeReasons" array to the info message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I think I'll add a new field in the API then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants