Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stm32] Prevent SPI RX FIFO overflow in SPI master #1223

Merged
merged 1 commit into from
Nov 12, 2024

Conversation

WasabiFan
Copy link
Contributor

I was doing moderately high bandwidth streaming over SPI on an STM32H7. When I added some UART debug printing elsewhere in my application, it caused an indefinite hang in the SPI transfer() routine in this loop:

while (rxIndex < length) {
while ((txIndex < length) and !Hal::isTxFifoFull()) {
Hal::write(tx ? tx[txIndex] : 0);
++txIndex;
}
while ((rxIndex < length) and Hal::isRxDataAvailable()) {
const uint8_t data = Hal::read();
if (rx) {
rx[rxIndex] = data;
}
++rxIndex;
}

I could see txIndex had run far ahead of rxIndex (>30 bytes). The TX loop had all those 30+ bytes in its first execution without the RX loop getting a chance to pop anything. The peripheral had hit an overrun and entered suspended state, meaning the TX FIFO had completely backpressured and the loop was spinning waiting for RX frames which were not forthcoming.

After some investigation I think there are two bugs/oversights in the SpiMaster implementation.

  1. I believe the SPI peripheral had popped frames from the SPI TX FIFO before the TX loop had completed, which meant the HAL enqueued more TX frames than would fit in the RX FIFO when the RX data eventually arrived. I think it is a bug to not consider the RX FIFO capacity when pushing into the TX fifo.
  2. The HAL sets CR1.MASRX ("master automatic SUSP in receive mode"), which means an RX overflow will cause the PHY to enter "suspended" state until it is cleared by software. But there is no software logic to detect this condition, and the bit will only be cleared on transaction completion. So it isn't clear to me why this bit is set -- the transfer will hang regardless. If the transfer logic is fixed to prevent an RX overflow, this shouldn't matter, but I wanted to call it out in case there was intent to this setting which I don't understand.

I believe the UART byte-by-byte TX interrupts were occurring while in the SPI transmit loop, delaying execution enough for the SPI peripheral to get ahead.

This PR includes a fix for (1) by computing the number of outstanding bytes and bailing out of the TX loop when it hits the RX FIFO capacity. This seems to match the Cube HAL implementation.

The non-H7 STM32 driver does a byte-by-byte transfer, which means it shouldn't be affected by this bug. I took a cursory look at the H7 DMA implementation and it isn't obvious to me whether it has a similar failure mode. It seems the operating theory of the DMA implementation is that the RX DMA will clear the FIFO faster than the RX FIFO is populated. I don't see any explicit overflow handling. So it might make sense to detect a suspension and resume the TX DMA once RX is flushed. Or it might be unnecessary if the above assumption holds true. Since TX and RX DMA are happening in parallel, it seems fair to me.

I don't have a great understanding of the STM32 SPI IP so I would appreciate a double-check that my diagnosis and fix are appropriate. I haven't done any testing beyond confirming that my Fiber-based test application no longer hangs and produces the expected result. I would also appreciate any thoughts on (2) above. Also, I would appreciate any suggestions on how to reliably hit this condition in a unit test.

In full duplex mode (only mode supported by SpiMaster), every
transmitted byte will produce a corresponding received byte which is
pushed into the RX FIFO with some delay. The existing TX loop does not
limit outstanding transactions to ensure they will fit within the RX
FIFO. If the SPI PHY transmits a frame while the HAL is in its TX loop,
it will freely push more TX frames until the TX FIFO is full. These
extra frames may overflow the RX FIFO if the HAL hasn't gotten to
popping incoming frames by the time they are received.

This could be solved by checking for the SUSP (suspended to prevent
overflow) during TX or by using the DXP bit to coordinate transactions
instead of TXP/RXP. This commit instead chooses to track the number of
outstanding frames using the existing counters and prevent transmit if
there are more outstanding frames than would fit in the RX FIFO.

This implementation assumes 8-bit/one-byte SPI data size, which is an
existing limitation of the SpiMaster.
Copy link
Member

@chris-durand chris-durand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good catch! Not considering the RX FIFO capacity is clearly a bug.

@chris-durand
Copy link
Member

The HAL sets CR1.MASRX ("master automatic SUSP in receive mode"), which means an RX overflow will cause the PHY to enter "suspended" state until it is cleared by software. But there is no software logic to detect this condition

Setting CR1.MASRX seems to be an oversight 🙈.

I took a cursory look at the H7 DMA implementation and it isn't obvious to me whether it has a similar failure mode. It seems the operating theory of the DMA implementation is that the RX DMA will clear the FIFO faster than the RX FIFO is populated. I don't see any explicit overflow handling. So it might make sense to detect a suspension and resume the TX DMA once RX is flushed. Or it might be unnecessary if the above assumption holds true. Since TX and RX DMA are happening in parallel, it seems fair to me.

According to the reference manual it seems the DMA implementation has a similar failure mode. The SPI DMA subsection in the H723/5 reference manual states:

If the SPI is programmed in full-duplex mode, RXP and OVR are eventually set, because
received data are not read.

It could be worth adding some RX overflow handling to the SPI DMA driver. I'd first check what the CubeHAL does. However, it seems a fair assumption that this condition is less likely to occur in practice. To create this condition a steady stream of TX data would need to be delivered to the peripheral, while RX data is comparatively slow to be transferred. In most cases TX and RX DMA use the same data path. I'd assume whatever would let the RX buffer overflow would also stall transmissions.
Of course configurations are imaginable where the RX data is written to a different SRAM with a slowed-down bus clock. It would be an interesting test case to try to reproduce that issue with DMA.

@WasabiFan
Copy link
Contributor Author

I'd first check what the CubeHAL does.

My reading of the Cube implementation is that it does the same as the modm one. It configures RX, configures TX, then initiates the transfer. It doesn't look to check the SUSP bit or do any special handling of the SUSP interrupt until EOT. So as far as I can tell they also make the assumption that the RX stream can't backpressure the TX. If my reading is correct I figure the existing DMA implementation in modm is fine.

Of course configurations are imaginable where the RX data is written to a different SRAM with a slowed-down bus clock. It would be an interesting test case to try to reproduce that issue with DMA.

It seems to me that all the AHBs and whatever clock the built-in SRAMs are on are locked together; the clock tree doesn't have any dividers after HPRE. So I guess this couldn't be done with SRAM1-SRAM4. Are you thinking of an external SRAM? Or am I misunderstanding the reference manual?

@chris-durand
Copy link
Member

It seems to me that all the AHBs and whatever clock the built-in SRAMs are on are locked together; the clock tree doesn't have any dividers after HPRE. So I guess this couldn't be done with SRAM1-SRAM4. Are you thinking of an external SRAM? Or am I misunderstanding the reference manual?

You're right. I somehow misremembered how the clock tree looks like. My suggestion wouldn't work.

@salkinium salkinium merged commit 678fd9a into modm-io:develop Nov 12, 2024
68 checks passed
WasabiFan added a commit to WasabiFan/modm that referenced this pull request Nov 18, 2024
Follow-up to 678fd9a and modm-io#1223.
It seems that once the RX FIFO has been cleared and emptied, the next
transmitted byte always enters SUSP mode. It is unclear why this
happens. When MASRX is not set, neither SUSP nor OVR are seen for the
same transmit sequence.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants