-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set device to maintenance mode if TPM error is detected #4462
base: master
Are you sure you want to change the base?
Conversation
a31d633
to
4b184ce
Compare
a4ed054
to
98aab97
Compare
98aab97
to
ed07c1d
Compare
ed07c1d
to
886f670
Compare
886f670
to
4515380
Compare
What remained :
|
4515380
to
6280323
Compare
Update EVE-API to include TPM error in maintenance mode status. Signed-off-by: Shahriyar Jalayeri <[email protected]>
This change makes eve rely on the API definition of MaintenanceModeReason instead of the other way around. Signed-off-by: Shahriyar Jalayeri <[email protected]>
6280323
to
9ec1a4e
Compare
Updating EVE-API updates protobuf as a side effect : pillar git:(maintenance.mode.tpm.err) ✗ go get -u github.com/lf-edge/eve-api/go/info
go: upgraded github.com/lf-edge/eve-api/go v0.0.0-20241213165007-1a8f9be485b1 => v0.0.0-20250102213900-786246223024
go: upgraded google.golang.org/protobuf v1.33.0 => v1.36.1 |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4462 +/- ##
=======================================
Coverage 20.90% 20.90%
=======================================
Files 13 13
Lines 2894 2894
=======================================
Hits 605 605
Misses 2163 2163
Partials 126 126 ☔ View full report in Codecov by Sentry. |
9ec1a4e
to
0a8a65d
Compare
@eriknordmark for fault injection I exprimented with SWTPM https://github.com/shjala/swtpm-fault-injection, but this won't work because Qemu is using control socket of SWTPM and not the TPM socket, so I have to do some reverse engineering to figure out how to inject faults there (work in progress...). |
@shjala, could you please take a look at Erik's comments? =) |
I'm aware, I'm doing some preliminary work to address the comments. |
0a8a65d
to
2c2d305
Compare
@eriknordmark Testing still is a bit manual, but works, I'll create some scripts to automate it and later add it to Eden: ➜ swtpm-fault-injection git:(main) ✗ git clone [email protected]:shjala/swtpm-fault-injection.git
➜ swtpm-fault-injection git:(main) ✗ cd swtpm-fault-injection
➜ swtpm-fault-injection git:(main) ✗ which swtpm
/usr/local/bin/swtpm
➜ swtpm-fault-injection git:(main) ✗ ldd /usr/local/bin/swtpm
linux-vdso.so.1 (0x00007ffc6e38c000)
libswtpm_libtpms.so.0 => /usr/local/lib/swtpm/libswtpm_libtpms.so.0 (0x00007b4d65723000)
libglib-2.0.so.0 => /lib/x86_64-linux-gnu/libglib-2.0.so.0 (0x00007b4d655ce000)
libtpms.so.0 => /usr/local/lib/libtpms.so.0 (0x00007b4d6549c000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007b4d65200000)
libjson-glib-1.0.so.0 => /lib/x86_64-linux-gnu/libjson-glib-1.0.so.0 (0x00007b4d6546f000)
libgobject-2.0.so.0 => /lib/x86_64-linux-gnu/libgobject-2.0.so.0 (0x00007b4d651a0000)
libseccomp.so.2 => /lib/x86_64-linux-gnu/libseccomp.so.2 (0x00007b4d6544d000)
libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007b4d64c00000)
libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007b4d6512a000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007b4d64b19000)
/lib64/ld-linux-x86-64.so.2 (0x00007b4d65758000)
libgio-2.0.so.0 => /lib/x86_64-linux-gnu/libgio-2.0.so.0 (0x00007b4d6493f000)
libffi.so.8 => /lib/x86_64-linux-gnu/libffi.so.8 (0x00007b4d6543e000)
libgmodule-2.0.so.0 => /lib/x86_64-linux-gnu/libgmodule-2.0.so.0 (0x00007b4d65437000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007b4d6510e000)
libmount.so.1 => /lib/x86_64-linux-gnu/libmount.so.1 (0x00007b4d650ca000)
libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007b4d6509e000)
libblkid.so.1 => /lib/x86_64-linux-gnu/libblkid.so.1 (0x00007b4d65067000)
libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007b4d648a8000)
➜ swtpm-fault-injection git:(main) ✗ ./findoffset.sh /usr/local/lib/swtpm/libswtpm_libtpms.so.0 SWTPM_IO_Read
Function: SWTPM_IO_Read
Start Address: 0x0000000000013e84
End Address: 0x0000000000013f67
End Offset for uretprobe: 227 bytes
➜ swtpm-fault-injection git:(main) ✗ sudo ./main -swtpm -pid <SWTPM_PID> -libtpms "/usr/local/lib/swtpm/libswtpm_libtpms.so.0" -swtpm-io-read-offset 227 -inputs="CmdECDHZGen:RCFailure:3m:false" |
@eriknordmark Now device can be in Maintenance Mode for multiple reasons, I'm testing this by simulating multiple reasons and see if it works as expected. I suspect controller has to change too? |
965cffb
to
b49e579
Compare
When I simulate multiple maintenance mode reasons, |
b49e579
to
b2e4070
Compare
We have seen errors like "point is not on the required curve" in the past that prevents the device from upgrading because TPM fails to perform an ECDH key agreement to generate the KEK and decrypt the vault key. To catch these errors early, check if TPM is not functioning properly on a critical path and set the device to maintenance mode. Signed-off-by: Shahriyar Jalayeri <[email protected]>
Instead of checking TPM sanity only once at startup, check it periodically this allows us to detect TPM errors that may occur after the initial check, and clear TPM errors if they are transient. Signed-off-by: Shahriyar Jalayeri <[email protected]>
This change allows the device to have multiple reasons for being in maintenance mode. This is useful when multiple conditions are met that require the device to be in maintenance mode. For example, if the TPM is in error and the device disk is full, the device can be in maintenance mode for both reasons. Clearing one of the reasons will not take the device out of maintenance mode if there are other reasons for it to be in maintenance mode. Signed-off-by: Shahriyar Jalayeri <[email protected]>
b2e4070
to
0c1204e
Compare
} | ||
if (mmr & MaintenanceModeReasonUserRequested) == MaintenanceModeReasonUserRequested { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the MaintModeReason* are values 1,2,3,4 thus you can't use them as bits in a bit map.
If you set 1|2 you end up with the 3 in the field and String() reports as 1, 2, and 3 were set.
Thus to handle multiple we either need to change the enum values in the API to be bits (1<<0, 1<<1, etc), which would an incompatible value change for MaintenanceModeReasonNoDiskSpace (value would change from 3 to 1<<2 = 4), or add some "additionalMaintenceModeReasons" array to the info message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I think I'll add a new field in the API then.
We have seen errors like "point is not on the required curve" in the past that prevents the device from upgrading because TPM fails to perform an ECDH key agreement to generate the KEK and decrypt the vault key. To catch these errors early, check if TPM is not functioning properly on a critical path and set the device to maintenance mode.