Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport HEVC decoder as submitted to linux-media as v2 #6666

Open
wants to merge 817 commits into
base: rpi-6.12.y
Choose a base branch
from

Conversation

6by9
Copy link
Contributor

@6by9 6by9 commented Feb 12, 2025

This PR:

  • reverts the downstream HEVC driver and supporting patches
  • backports the driver as submitted as v2 (https://lore.kernel.org/r/[email protected]) with DT binding fixup requested by Rob H, and dependencies.
  • adds in the new SAND handling for DRM
  • reinstates the downstream single planar pixel formats for decode
  • reinstates being able to set the /dev/videoN node

This doesn't work against the existing FFmpeg as it is looking for extra flags to identify a device that it thinks can decode HEVC. @jc-kynesim was going to release his updated FFmpeg that handles this, otherwise I can look at whether we can expose a second /dev/videoN node that adds in those flags as an interim.

This wants more thorough testing, hence looking to push it to rpi-6.12.y.

6by9 and others added 30 commits February 10, 2025 13:02
Rather than clearing all the bits in rp1_pll_divider_off
and setting PLL_SEC_RST, retain the status of all the other
bits.

Signed-off-by: Dave Stevenson <[email protected]>
The clock setup had been copied from clk-bcm2835 which had to cope
with the firmware having configured clocks, so there were flags
of CLK_IS_CRITICAL and CLK_IGNORE_UNUSED dotted around.

That isn't the situation with RP1 where only the main PLLs, CLK_SYS,
and CLK_SLOW_SYS are critical, so update the configuration to match.

Signed-off-by: Dave Stevenson <[email protected]>
Configure RP1's ethernet block to do the correct thing.
clk_eth is intended to be fixed at 125MHz, so use a new compatible,
and use assigned-clocks to configure the clock appropriately.

Signed-off-by: Dave Stevenson <[email protected]>
The DMA block has a clock, but wasn't defined in the driver. This
resulted in the parent being disabled as unused, and then DMA
stopped working.

Signed-off-by: Dave Stevenson <[email protected]>
There should be no issue in disabling the RP1 clocks as long as
the kernel knows about all consumers.

Signed-off-by: Dave Stevenson <[email protected]>
hclk and pclk of the MAC are connected to clk_sys, so define
them as being connected accordingly, rather than having fake
fixed clocks for them.

Signed-off-by: Dave Stevenson <[email protected]>
This makes the kernel representation of the clock structure
match reality.

Signed-off-by: Dave Stevenson <[email protected]>
In the move to the upstream bcm2712.dts, the A76 PMU was omitted from
the required downstream additions.

Link: raspberrypi#6507

Signed-off-by: Phil Elwell <[email protected]>
Following the merging of [1], it is safe to re-enable DMA to UART0
without fear of losing data.

Seen while looking at raspberrypi#6507.

[1] dmaengine: dw-axi-dmac: Allow client-chosen width

Signed-off-by: Phil Elwell <[email protected]>
gpio-direct mode is a modification of the brcmstb GPIO driver that
makes it play nicely with the userspace pinctrl utility. The mode
forces the drive to read its state from the hardware each time, rather
than relying on cached state. Doing so slightly reduces performance,
but this is not a heavily used code path.

Signed-off-by: Phil Elwell <[email protected]>
Fractional source co-ordinates can be used to setup the scaling
filters, so retain the information.

Signed-off-by: Dom Cobley <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
Apply fractional source co-ordinates into the scaling filters.

Signed-off-by: Dom Cobley <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
When the margins are changed, the dlist needs to be regenerated
with the changed updated dest regions for each of the planes.

Setting the zpos_changed flag is sufficient to trigger that
without doing a full modeset, therefore set it should the
margins be changed.

Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
Support displaying DRM_FORMAT_YUV444 and DRM_FORMAT_YVU444 formats.
Tested with kmstest and kodi. e.g.

kmstest -r 1920x1080@60 -f 400x300-YU24

Note: without the shift of width, only half the chroma is fetched,
resulting in correct left half of image and corrupt colours on right half.

The increase in width shouldn't affect fetching of Y data,
as the hardware will clamp at dest width.

Signed-off-by: Dom Cobley <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
The VC4 HDMI driver has a bunch of accessors to read from a register.
The read accessor was warning when accessing an unknown register, but
the write one was just returning silently.

Let's make sure we warn also when writing to an unknown register.

Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
DLIST generation can get pretty tricky and there's not a lot of debug in
the driver to help. Let's add a few more to track the generated DLIST
size.

Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
We need to allocate a few additional structures when checking our
atomic_state, especially related to hardware SRAM that will hold the
plane descriptors (DLIST) and the current line context (LBM) during
composition.

Since those allocation can fail, let's add some error message in that
case to help debug what goes wrong.

Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
LBM allocations need a different size depending on the line length,
format, etc.

This can get tricky, and fail. Let's add some more prints to ease the
debugging when it does.

Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
The vc4_plane_atomic_check() directly returns the result of the final
function it calls.

Using the already defined ret variable to check its content on error,
and a separate return 0 on success, makes it easier to extend.

Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
We access multiple times the vc4_crtc_state->assigned_channel variable
in the vc4_crtc_get_scanout_position() function, so let's store it in a
local variable.

Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
It has been observed that a YUV422 unity scaled plane isn't displayed.
Enabling vertical scaling on the UV planes solves this. There is
already a similar clause to always enable horizontal scaling on the
UV planes.

Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
ABORT_ON_EMPTY chooses whether the HVS abandons the current frame
when it experiences an underflow, or attempts to continue.

In theory the frame should be black from the point of underflow,
compared to a shift of sebsequent pixels to the left.

Unfortunately it seems to put the HVS is a bad state where it is not
possible to recover simply. This typically requires a reboot
following the 'flip done timed out message'.

Discussion with Broadcom has suggested we don't use this flag.
All their testing is done with it disabled.

Additionally setting BLANK_INSERT_EN causes the HDMI to output
blank pixels on an underflow which avoids it losing sync.

After this change a 'flip done timed out' due to sdram bandwidth
starvation or too low a clock is recoverable once the situation improves.

Signed-off-by: Dom Cobley <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
The V3D IP has been separate since BCM2711, so let's make sure we issue
a WARN if we're running not only on BCM2711, but also anything newer.

Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
…output

Since we'll support BCM2712 soon, let's move the logic behind
vc4_hvs_get_fifo_from_output() to a switch to extend it more easily.

Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
Since the BCM2712 will feature a significantly different HVS, let's move
the hardware initialisation part of our bind function into a separate
function.

That way, it will be easier to extend in the future.

Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
Just like the HVS itself, the COB parameters will be fairly different in
the BCM2712.

Let's move the COB parameters computation and its initialisation to a
separate function that will be easier to extend in the future.

Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
The HVS register set has been heavily modified in the BCM2712, and we'll
thus need a separate debugfs_reg32 array for it.

The name hvs_regs is thus a bit too generic, so let's rename it to
something more specific.

Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
The BCM2712 will have a fairly different dlist, that will feature one
Pointer 0 word for each plane.

Let's prepare by changing the ptr0_offset variable that holds the offset
in a dlist of the pointer 0 word to an array.

Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
With the introduction of the support for BCM2712, the check of whether
we're running on vc5 or not to compute the LBM alignment requirement
doesn't work anymore.

Moreover, the LBM size will need to be computed in words for the
BCM2712, while we've had sizes in bytes so far.

Aligning on either 64 or 32 words is thus fairly harmful on BCM2712, so
let's just explicitly align the size when needed, and then call
drm_mm_insert_node_generic() with an alignment of 1.

Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
The BCM2712 HVS has registers to report the size of the various SRAM the
driver uses, and their size actually differ depending on the stepping.

The initialisation of the memory pools happen in the __vc4_hvs_alloc()
function that also allocates the main HVS structure, that will then hold
the pointer to the memory mapping of the registers.

This creates some kind of circular dependency that we can break by
passing the mapping pointer as an argument for __vc4_hvs_alloc() to use
to query to get the SRAM sizes and initialise the memory pools
accordingly.

Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Dave Stevenson <[email protected]>
sholdee and others added 27 commits February 10, 2025 13:03
Renesas UPD720201/-202 is a popular alternative to the VL805 USB3 phy
used in many CM4 based products.

Commit 25f51b7 ("xhci-pci: Make xhci-pci-renesas a proper modular
driver") reworked the Renesas XHCI driver, resulting in
CONFIG_USB_XHCI_PCI_RENESAS no longer being implicitly enabled.

Explicitly add it to the defconfig to restore USB3 functionality.

Signed-off-by: Nicolai Buchwitz <[email protected]>
If this is a DMA transfer, and if there is no simultaneous RX transfer,
wait for the interface to go idle before reporting that TX is done.

Link: https://forums.raspberrypi.com/viewtopic.php?t=383027

Signed-off-by: Phil Elwell <[email protected]>
If the RP1 firmware has reported an error then return that from the PIO
probe function, otherwise defer the probing.

Link: raspberrypi#6642

Signed-off-by: Phil Elwell <[email protected]>
To avoid pointless retries, let the probe function succeed if the
firmware interface is configured correctly but the firmware is
incompatible. The value of the private drvdata field holds the outcome.

Link: raspberrypi#6642

Signed-off-by: Phil Elwell <[email protected]>
The of_xlate method saves the calculated event mask in the con_priv
field. It also rejects subsequent attempt to use that channel because
the mask is non-zero, which causes a repeated instantiation of a client
driver to fail.

The of_xlate method is not meant to be a point of resource acquisition.
Leave the con_priv initialisation, but drop the test that it was
previously zero.

Signed-off-by: Phil Elwell <[email protected]>
With a future release of the firmware, it will be possible to use dts
files with a top-level #size-cells of 2. This patch adds the remaining
necessary changes to make that work, gated by the macro symbol
FIRMWARE_UDPATED.

Signed-off-by: Phil Elwell <[email protected]>
Without this, a default of 0 is used which is very suboptimal for timely
service. Consistency with Pi 5 is desired.

Signed-off-by: Jonathan Bell <[email protected]>
By default when the last request object is completed, the whole
request completes as well.

But sometimes you want to manually complete a request in a driver,
so add a manual complete mode for this.

In req_queue the driver marks the request for manual completion by
calling media_request_mark_manual_completion, and when the driver
wants to manually complete the request it calls
media_request_manual_complete().

Signed-off-by: Hans Verkuil <[email protected]>
Manually complete the requests: this tests the manual completion
code.

Signed-off-by: Hans Verkuil <[email protected]>
Keep track of the number of requests and request objects of a media
device. Helps to verify that all request-related memory is freed.

Signed-off-by: Hans Verkuil <[email protected]>
The Raspberry Pi HEVC decoder uses a tiled format based on
columns for 8 and 10 bit YUV images, so document them as
NV12MT_COL128 and NV12MT_10_COL128.

Signed-off-by: Dave Stevenson <[email protected]>
Add V4L2_PIXFMT_NV12MT_COL128 and V4L2_PIXFMT_NV12MT_10_COL128
to describe the Raspberry Pi HEVC decoder NV12 multiplanar formats.

Signed-off-by: Dave Stevenson <[email protected]>
Adds a binding for the HEVC decoder found on the BCM2711 / Raspberry Pi 4,
and BCM2712 / Raspberry Pi 5.

Signed-off-by: Dave Stevenson <[email protected]>
The BCM2711 and BCM2712 SoCs used on Rapsberry Pi 4 and Raspberry
Pi 5 boards include an HEVC decoder block. Add a driver for it.

Signed-off-by: John Cox <[email protected]>
Signed-off-by: Dave Stevenson <[email protected]>
Add the configuration information for the HEVC decoder.

Signed-off-by: Dave Stevenson <[email protected]>
For downstream only, add back in the legacy single planar
SAND formats.

Signed-off-by: Dave Stevenson <[email protected]>
Upstream will take the multi-planar SAND format, but add back
in the downstream single planar variant for backwards compatibility

Signed-off-by: Dave Stevenson <[email protected]>
The SAND handling had been using what was believed to be a runtime
parameter in the modifier, however that has been clarified that
all permitted variants of the modifier must be advertised, so
making it variable wasn't practical.

With a rationalisation of how the producers of this format are
configured, we can switch to a variant that doesn't have as much
variation, and can be configured such that only 2 options are
required.

Add a modifier with value 0 to denote that the height of the luma
column matches the buffer height, and chroma column will be half
that due to YUV420.
A modifier of 1 denotes that the height of the luma column still
matches the buffer height, but the chroma column height is the same.
This can be used to replicate the previous behaviour.

Signed-off-by: Dave Stevenson <[email protected]>
To avoid user complaints that /dev/video0 isn't their USB
webcam, add downstream patch that allows setting the preferred
video device number.

Signed-off-by: Dave Stevenson <[email protected]>
@popcornmix
Copy link
Collaborator

I think we want ffmpeg update public (or the second /dev/videoN node) before merging this.

@6by9
Copy link
Contributor Author

6by9 commented Feb 12, 2025

I think we want ffmpeg update public (or the second /dev/videoN node) before merging this.

I'd asked @jc-kynesim whether the FFmpeg side had been released on Friday, and he'd replied

Not yet - but that is only the work of an hour or two. I'll do that today

I don't know if that happened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.