Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

media: pisp-be: Split jobs creation and scheduling #6302

Open
wants to merge 4 commits into
base: rpi-6.6.y
Choose a base branch
from

Conversation

jmondi
Copy link
Contributor

@jmondi jmondi commented Aug 5, 2024

Before submitting the same change to mainline, I would like to know what you think about this. This change will make it easier to support multi-context handling without duplicating the media graph instances.


Currently the 'pispbe_schedule()' function does two things:

  1. Tries to assemble a job by inspecting all the video node queues
    to make sure all the required buffers are available
  2. Submit the job to the hardware

The pispbe_schedule() function is called at:

  • video device start_streaming() time
  • video device qbuf() time
  • irq handler

As assembling a job requires inspecting all queues, it is a rather time consuming operation which is better not run in IRQ context.

To avoid the executing the time consuming job creation in interrupt context split the job creation and job scheduling in two distinct operations. When a well-formed job is created, append it to the newly introduced 'pispbe->job_queue' where it will be dequeued from by the scheduling routine.

At start_streaming() and qbuf() time immediately try to schedule a job if one has been created as the irq handler routing is only called when a job has completed, and we can't solely rely on it for scheduling new jobs.

*/
spin_unlock_irqrestore(&pispbe->hw_lock, flags);

if (job->config->num_tiles <= 0 ||
Copy link
Contributor

@njhollinghurst njhollinghurst Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the above comment ("We can kick the job off without the hw_lock...") still relevant?
(I must admit I only partially understood it before... but it seems unnecessary now if this function doesn't actually start the job).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it still is relevant, as we're actually kicking off a job without the hw_lock being held, as the driver relies on the hw_busy flag to discern if the system is running a job or not ?

@njhollinghurst
Copy link
Contributor

It's a moderately involved change so difficult to review, but LGTM. The principle certainly seems to be a good one.

@jmondi
Copy link
Contributor Author

jmondi commented Aug 7, 2024

It's a moderately involved change so difficult to review, but LGTM. The principle certainly seems to be a good one.

I was mostly wondering if you have more sophisticated test tools to validate if this change impact in any way on performances. I've run two a concurrent instances of qcam with an ov5647 and and imx219 and things looked ok apart from the fact the ov5647 runs at 7fps compared to the 15 fps at max res the datasheet claims. I'll soon test with your 6.6.y tree to see if it behaves the same.

@jmondi
Copy link
Contributor Author

jmondi commented Aug 9, 2024

I can confirm I get the same framerates with rpi-6.6.y

pi@raspberrypi:~$ cam -c1 -C10

[0:04:28.399287332] [1092]  INFO RPI pisp.cpp:1450 Sensor: /base/axi/pcie@120000/rp1/i2c@88000/imx219@10 - Selected sensor format: 1640x1232-SBGGR10_1X10 -B

268.839072 (0.00 fps) cam0-stream0 seq: 000007 bytesused: 1920000
268.905656 (15.02 fps) cam0-stream0 seq: 000008 bytesused: 1920000
268.972379 (14.99 fps) cam0-stream0 seq: 000009 bytesused: 1920000
269.038872 (15.04 fps) cam0-stream0 seq: 000010 bytesused: 1920000
269.105476 (15.01 fps) cam0-stream0 seq: 000011 bytesused: 1920000
269.172096 (15.01 fps) cam0-stream0 seq: 000012 bytesused: 1920000
269.238724 (15.01 fps) cam0-stream0 seq: 000013 bytesused: 1920000
269.305332 (15.01 fps) cam0-stream0 seq: 000014 bytesused: 1920000
269.371979 (15.00 fps) cam0-stream0 seq: 000015 bytesused: 1920000
269.438618 (15.01 fps) cam0-stream0 seq: 000016 bytesused: 1920000
pi@raspberrypi:~$ cam -c2 -C10

[0:04:31.826608053] [1101]  INFO RPI pisp.cpp:1450 Sensor: /base/axi/pcie@120000/rp1/i2c@80000/ov5647@36 - Selected sensor format: 1296x972-SGBRG10_1X10 - g

cam0: Capture 10 frames
272.787750 (0.00 fps) cam0-stream0 seq: 000008 bytesused: 1920000
272.921503 (7.48 fps) cam0-stream0 seq: 000009 bytesused: 1920000
273.055178 (7.48 fps) cam0-stream0 seq: 000010 bytesused: 1920000
273.194079 (7.20 fps) cam0-stream0 seq: 000011 bytesused: 1920000
273.332947 (7.20 fps) cam0-stream0 seq: 000012 bytesused: 1920000
273.473272 (7.13 fps) cam0-stream0 seq: 000013 bytesused: 1920000
273.612185 (7.20 fps) cam0-stream0 seq: 000014 bytesused: 1920000
273.751054 (7.20 fps) cam0-stream0 seq: 000015 bytesused: 1920000
273.890057 (7.19 fps) cam0-stream0 seq: 000016 bytesused: 1920000
274.028928 (7.20 fps) cam0-stream0 seq: 000017 bytesused: 1920000

Jacopo Mondi added 4 commits September 4, 2024 09:43
A comment in the pisp_be driver references the
pispbe_schedule_internal() function which doesn't exist.

Drop it.

Signed-off-by: Jacopo Mondi <[email protected]>
Reviewed-by: Laurent Pinchart <[email protected]>
The config parameters buffer is already validated in
pisp_be_validate_config() at .buf_prepare() time.

However some of the same validations are also performed at
pispbe_schedule() time. In particular the function checks that:

1) config.num_tiles is valid
2) At least one of the BAYER or RGB input is enabled

The input config validation is already performed in
pisp_be_validate_config() and while job.hw_enables is modified by
pispbe_xlate_addrs(), the function only resets the input masks if

- there is no input buffer available, but pispbe_prepare_job() fails
  before calling pispbe_xlate_addrs() in this case
- bayer_enable is 0, but in this case rgb_enable is valid as guaranteed
  by pisp_be_validate_config()
- only outputs are reset in rgb_enable

For this reasons there is no need to repeat the check at
pispbe_schedule() time.

The num_tiles validation can be moved to pisp_be_validate_config() as
well. As num_tiles is a u32 it can'be be < 0, so change the sanity
check accordingly.

Signed-off-by: Jacopo Mondi <[email protected]>
Reviewed-by: Laurent Pinchart <[email protected]>
Currently the 'pispbe_schedule()' function does two things:

1) Tries to assemble a job by inspecting all the video node queues
   to make sure all the required buffers are available
2) Submit the job to the hardware

The pispbe_schedule() function is called at:

- video device start_streaming() time
- video device qbuf() time
- irq handler

As assembling a job requires inspecting all queues, it is a rather
time consuming operation which is better not run in IRQ context.

To avoid the executing the time consuming job creation in interrupt
context split the job creation and job scheduling in two distinct
operations. When a well-formed job is created, append it to the
newly introduced 'pispbe->job_queue' where it will be dequeued from
by the scheduling routine.

As the per-node 'ready_queue' buffer list is only accessed in vb2
ops callbacks, protected by a mutex, it is not necessary to guard it
with a dedicated spinlock so drop it. Also use the spin_lock_irq()
variant in all functions not called from an IRQ context where the
spin_lock_irqsave() version was used.

Signed-off-by: Jacopo Mondi <[email protected]>
During the probe() routine, the driver needs to power up the interface
in order to identify and initialize the hardware and it later suspends
it at the end of probe().

The driver erroneously resumes the interface by calling the
pispbe_runtime_resume() function directly but suspends it by
calling pm_runtime_put_autosuspend().

This causes a PM usage count imbalance at probe time, notified by the
runtime_pm framework with the below message in the system log:

 pispbe 1000880000.pisp_be: Runtime PM usage count underflow!

Fix this by suspending the interface using pm_runtime_idle() which
doesn't decrease the pm_runtime usage count and inform the PM framework
that the device is active by calling pm_runtime_set_active().

Adjust the pispbe_remove() function as well to disable
the pm_runtime in the correct order.

Signed-off-by: Jacopo Mondi <[email protected]>
Reviewed-by: Laurent Pinchart <[email protected]>

---
v4->v5:
- Indent with tabs :/

v3->v4:
- Instead of using pm_runtime for resuming, suspend using
  pm_runtime_idle() to support !CONFIG_PM

v2->v3:
- Mark pispbe_runtime_resume() as __maybe_unused as reported by
  the kernel test robot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants