[stm32] Prevent SPI RX FIFO overflow in SPI master #1223

WasabiFan · 2024-11-11T06:56:34Z

I was doing moderately high bandwidth streaming over SPI on an STM32H7. When I added some UART debug printing elsewhere in my application, it caused an indefinite hang in the SPI transfer() routine in this loop:

modm/src/modm/platform/spi/stm32h7/spi_master.cpp.in

Lines 73 to 84 in 28c87e4

    
           while (rxIndex < length) { 
        
           	while ((txIndex < length) and !Hal::isTxFifoFull()) { 
        
           		Hal::write(tx ? tx[txIndex] : 0); 
        
           		++txIndex; 
        
           	} 
        
           	while ((rxIndex < length) and Hal::isRxDataAvailable()) { 
        
           		const uint8_t data = Hal::read(); 
        
           		if (rx) { 
        
           			rx[rxIndex] = data; 
        
           		} 
        
           		++rxIndex; 
        
           	}

I could see txIndex had run far ahead of rxIndex (>30 bytes). The TX loop had all those 30+ bytes in its first execution without the RX loop getting a chance to pop anything. The peripheral had hit an overrun and entered suspended state, meaning the TX FIFO had completely backpressured and the loop was spinning waiting for RX frames which were not forthcoming.

After some investigation I think there are two bugs/oversights in the SpiMaster implementation.

I believe the SPI peripheral had popped frames from the SPI TX FIFO before the TX loop had completed, which meant the HAL enqueued more TX frames than would fit in the RX FIFO when the RX data eventually arrived. I think it is a bug to not consider the RX FIFO capacity when pushing into the TX fifo.
The HAL sets CR1.MASRX ("master automatic SUSP in receive mode"), which means an RX overflow will cause the PHY to enter "suspended" state until it is cleared by software. But there is no software logic to detect this condition, and the bit will only be cleared on transaction completion. So it isn't clear to me why this bit is set -- the transfer will hang regardless. If the transfer logic is fixed to prevent an RX overflow, this shouldn't matter, but I wanted to call it out in case there was intent to this setting which I don't understand.

I believe the UART byte-by-byte TX interrupts were occurring while in the SPI transmit loop, delaying execution enough for the SPI peripheral to get ahead.

This PR includes a fix for (1) by computing the number of outstanding bytes and bailing out of the TX loop when it hits the RX FIFO capacity. This seems to match the Cube HAL implementation.

The non-H7 STM32 driver does a byte-by-byte transfer, which means it shouldn't be affected by this bug. I took a cursory look at the H7 DMA implementation and it isn't obvious to me whether it has a similar failure mode. It seems the operating theory of the DMA implementation is that the RX DMA will clear the FIFO faster than the RX FIFO is populated. I don't see any explicit overflow handling. So it might make sense to detect a suspension and resume the TX DMA once RX is flushed. Or it might be unnecessary if the above assumption holds true. Since TX and RX DMA are happening in parallel, it seems fair to me.

I don't have a great understanding of the STM32 SPI IP so I would appreciate a double-check that my diagnosis and fix are appropriate. I haven't done any testing beyond confirming that my Fiber-based test application no longer hangs and produces the expected result. I would also appreciate any thoughts on (2) above. Also, I would appreciate any suggestions on how to reliably hit this condition in a unit test.

In full duplex mode (only mode supported by SpiMaster), every transmitted byte will produce a corresponding received byte which is pushed into the RX FIFO with some delay. The existing TX loop does not limit outstanding transactions to ensure they will fit within the RX FIFO. If the SPI PHY transmits a frame while the HAL is in its TX loop, it will freely push more TX frames until the TX FIFO is full. These extra frames may overflow the RX FIFO if the HAL hasn't gotten to popping incoming frames by the time they are received. This could be solved by checking for the SUSP (suspended to prevent overflow) during TX or by using the DXP bit to coordinate transactions instead of TXP/RXP. This commit instead chooses to track the number of outstanding frames using the existing counters and prevent transmit if there are more outstanding frames than would fit in the RX FIFO. This implementation assumes 8-bit/one-byte SPI data size, which is an existing limitation of the SpiMaster.

chris-durand

Thanks, good catch! Not considering the RX FIFO capacity is clearly a bug.

chris-durand · 2024-11-11T22:03:29Z

The HAL sets CR1.MASRX ("master automatic SUSP in receive mode"), which means an RX overflow will cause the PHY to enter "suspended" state until it is cleared by software. But there is no software logic to detect this condition

Setting CR1.MASRX seems to be an oversight 🙈.

I took a cursory look at the H7 DMA implementation and it isn't obvious to me whether it has a similar failure mode. It seems the operating theory of the DMA implementation is that the RX DMA will clear the FIFO faster than the RX FIFO is populated. I don't see any explicit overflow handling. So it might make sense to detect a suspension and resume the TX DMA once RX is flushed. Or it might be unnecessary if the above assumption holds true. Since TX and RX DMA are happening in parallel, it seems fair to me.

According to the reference manual it seems the DMA implementation has a similar failure mode. The SPI DMA subsection in the H723/5 reference manual states:

If the SPI is programmed in full-duplex mode, RXP and OVR are eventually set, because
received data are not read.

It could be worth adding some RX overflow handling to the SPI DMA driver. I'd first check what the CubeHAL does. However, it seems a fair assumption that this condition is less likely to occur in practice. To create this condition a steady stream of TX data would need to be delivered to the peripheral, while RX data is comparatively slow to be transferred. In most cases TX and RX DMA use the same data path. I'd assume whatever would let the RX buffer overflow would also stall transmissions.
Of course configurations are imaginable where the RX data is written to a different SRAM with a slowed-down bus clock. It would be an interesting test case to try to reproduce that issue with DMA.

WasabiFan · 2024-11-12T05:32:27Z

I'd first check what the CubeHAL does.

My reading of the Cube implementation is that it does the same as the modm one. It configures RX, configures TX, then initiates the transfer. It doesn't look to check the SUSP bit or do any special handling of the SUSP interrupt until EOT. So as far as I can tell they also make the assumption that the RX stream can't backpressure the TX. If my reading is correct I figure the existing DMA implementation in modm is fine.

Of course configurations are imaginable where the RX data is written to a different SRAM with a slowed-down bus clock. It would be an interesting test case to try to reproduce that issue with DMA.

It seems to me that all the AHBs and whatever clock the built-in SRAMs are on are locked together; the clock tree doesn't have any dividers after HPRE. So I guess this couldn't be done with SRAM1-SRAM4. Are you thinking of an external SRAM? Or am I misunderstanding the reference manual?

chris-durand · 2024-11-12T06:07:33Z

It seems to me that all the AHBs and whatever clock the built-in SRAMs are on are locked together; the clock tree doesn't have any dividers after HPRE. So I guess this couldn't be done with SRAM1-SRAM4. Are you thinking of an external SRAM? Or am I misunderstanding the reference manual?

You're right. I somehow misremembered how the clock tree looks like. My suggestion wouldn't work.

Follow-up to 678fd9a and modm-io#1223. It seems that once the RX FIFO has been cleared and emptied, the next transmitted byte always enters SUSP mode. It is unclear why this happens. When MASRX is not set, neither SUSP nor OVR are seen for the same transmit sequence.

salkinium requested a review from chris-durand November 11, 2024 14:03

salkinium added advanced 🤯 fix 💎 labels Nov 11, 2024

salkinium added this to the 2024q4 milestone Nov 11, 2024

chris-durand approved these changes Nov 11, 2024

View reviewed changes

salkinium merged commit 678fd9a into modm-io:develop Nov 12, 2024
68 checks passed

WasabiFan mentioned this pull request Nov 21, 2024

[stm32] Disable MASRX in STM32H7 SPI #1230

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[stm32] Prevent SPI RX FIFO overflow in SPI master #1223

[stm32] Prevent SPI RX FIFO overflow in SPI master #1223

Uh oh!

WasabiFan commented Nov 11, 2024

Uh oh!

chris-durand left a comment

Uh oh!

chris-durand commented Nov 11, 2024

Uh oh!

WasabiFan commented Nov 12, 2024

Uh oh!

chris-durand commented Nov 12, 2024

Uh oh!

Uh oh!

Uh oh!

	while (rxIndex < length) {
	while ((txIndex < length) and !Hal::isTxFifoFull()) {
	Hal::write(tx ? tx[txIndex] : 0);
	++txIndex;
	}
	while ((rxIndex < length) and Hal::isRxDataAvailable()) {
	const uint8_t data = Hal::read();
	if (rx) {
	rx[rxIndex] = data;
	}
	++rxIndex;
	}

[stm32] Prevent SPI RX FIFO overflow in SPI master #1223

[stm32] Prevent SPI RX FIFO overflow in SPI master #1223

Uh oh!

Conversation

WasabiFan commented Nov 11, 2024

Uh oh!

chris-durand left a comment

Choose a reason for hiding this comment

Uh oh!

chris-durand commented Nov 11, 2024

Uh oh!

WasabiFan commented Nov 12, 2024

Uh oh!

chris-durand commented Nov 12, 2024

Uh oh!

Uh oh!

Uh oh!