-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RGB synchronization fails if there are more IR than RGB frames #54
Comments
Thanks for this, I'll take a look! The split code was built and tested with the 30Hz version in mind, so this may be a regression bug. |
If you want, I can send you my code in about two weeks (I'm currently on holiday). The logic is the same as in your code, just that I invert everything in case n_IR > n_RGB. However, even after this fix snychronization is still rather poor. For long sequences (~20k frames) the IR and RGB streams are up to 10 frames off in the middle of the sequence. Also, for my camera (Zenmuse XT2) the IR stream hangs once in a while when the camera performs recalibration. I was thinking of using the frame timestamps (from EXIF tags) for synchronization. In the TIFF stack (which I use instead of SEQs) each frame has a millisecond-accurate walltime associated to it. For the RGB stream there is only a millisecond-accurate relative timestamp starting from zero. However, I am in doubt about the timestamps of the IR stream as the recalibration procedure does not show up here. It would be interesting to know whether you are able to synchronize your streams properly. So far, the problem seems quite tough. I was even thinking of extracting descriptors from both IR and RGB stream and matching descriptors of each IR frame to temporally neighbouring RGB streams. Something like this: https://la.disneyresearch.com/publication/actionsnapping/ The main difficulty is that we have two different modalities, which makes typical feature descriptors, such as ORB, SIFT, and Bag-of-words unsuitable. |
Sure! This should be a simple fix and it's on my radar anyway. Sync is very
difficult, primarily because there is (a) a non-zero start offset between
the two streams (b) pretty much all the cameras break synchronisation when
they flat-field (I guess we could detect this by detecting static IR
frames?). In practice I've had to do this manually most of the time. I
tried a few approaches offline including FFT-based matching, but none of
them worked particularly well (if at all). There is some literature on
IR-RGB fusion from descriptors, but I wasn't able to get it to work with my
data. Really we need a good synced dataset to work with - and maybe then we
could just train a CNN or something (e.g. one backbone network for each
modality and then train on the loss between the outputs).
…On Sat, Aug 28, 2021 at 4:32 PM Lukas Bommes ***@***.***> wrote:
If you want, I can send you my code in about two weeks (I'm currently on
holiday). The logic is the same as in your code, just that I invert
everything in case n_IR > n_RGB.
However, even after this fix snychronization is still rather poor. For
long sequences (~20k frames) the IR and RGB streams are up to 10 frames off
in the middle of the sequence. Also, for my camera (Zenmuse XT2) the IR
stream hangs once in a while when the camera performs recalibration.
I was thinking of using the frame timestamps (from EXIF tags) for
synchronization. In the TIFF stack (which I use instead of SEQs) each frame
has a millisecond-accurate walltime associated to it. For the RGB stream
there is only a millisecond-accurate relative timestamp starting from zero.
However, I am in doubt about the timestamps of the IR stream as the
recalibration procedure does not show up here.
It would be interesting to know whether you are able to synchronize your
streams properly. So far, the problem seems quite tough. I was even
thinking of extracting descriptors from both IR and RGB stream and matching
descriptors of each IR frame to temporally neighbouring RGB streams.
Something like this:
https://la.disneyresearch.com/publication/actionsnapping/ The main
difficulty is that we have two different modalities, which makes typical
feature descriptors, such as ORB, SIFT, and Bag-of-words unsuitable.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#54 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYDMJY22DYHEAGBJ2BYH4TT7EFSDANCNFSM5CNWASIA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
After my holiday, I will take closer look into the synchronization. I was thinking of doing it the following way:
The latter step would certainly require some experimentation. Alternatives would be feature-based similarity metrics or extraction and matching of shapes, such as line segments. May I ask, which IR-RGB descriptors you tried out? I found this one, which looks promising: https://www.mdpi.com/1424-8220/20/18/5105 Another way would be to extract keypoints from IR and RGB and find matches based on a geometric constraint (e.g. the homography or more generally the fundamental matrix). The frame with lowest median spatial distance between matched keypoints would then be selected as match. A CNN is probably also an option, but it would have to be done in an un- or self-supervised manner since I have no idea how to acquire the ground-truth for synchronization (maybe some lightbulb blinking pattern which encodes walltime...). CNN-based snychronization was attempted here: https://arxiv.org/pdf/1610.05985.pdf Even though nowadays one would probably want to use an N-pair loss instead of the triplet loss. Did you also notice that the frame rate of the MOV file differs between videos? For my cameras it is 30 Hz, 29.87 Hz or 29.xx Hz for different MOV files (read out with ffprobe). |
Hi, has there been any progress on this? I'm trying to sync RGB-IR capture with a Boson and BlackFly S. How can I do this using Flirpy? |
In this case, you should use hardware triggering to sync both camera. Either you could enable a flash output from the Blackfly if possible or use an external trigger source (eg TTL square wave). You'd need an interface board for the Boson that supports sync signals. Then this becomes a mostly solved problem as the sync is hardware defined and as long as you read out the image before the next pulse, you should be able to guarantee the frames match. How you implement that is beyond the scope of Flirpy - you could for example use a single board computer to generate the trigger via GPIO. |
I wanted to point out a bug in the split_seqs script. The synchronization between IR and RGB works fine as long as there are more RGB than IR frames. However, if there are more Ir than RGB frames, the synchronization logic fails.
I ran into this issue after switching from the 8 Hz Flir Duo Pro R to the 30 Hz version. As the visual stream is at 29.87 Hz, there are more IR than RGB frames generated.
The text was updated successfully, but these errors were encountered: