-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement BT.1886 EOTF as alternative to BT.709 inverse gamma #61
Comments
I am aware of such issues (see #56). I would certainly appreciate it if you share any code you have. |
I checked in my changes here. |
@dzung-hoang I don't have a HDR display to test. Can you verify if PR #62 does what you want? I did not implement adjustable BT.1886 coefficient; from what I can tell, parameterization of BT.1886 requires two parameters and not a simple exponent adjustment. |
I am pulling your changes. I will need some time to also modify ffmpeg to work with this. I will then run some tests and let you know. |
I encourage you to study BT.2087-0. In Annex 1, there are two cases for non-linear to linear conversion of BT.709. Case 1 uses the exponent 2.40 as specified in BT.1886. Case 2 uses the exponent 2. I suggest that you support an adjustable exponent (that I call display gamma) to handle these two cases. Additionally, users may want to tweak the exponent to adjust the "look." The same two cases apply to the linear to non-linear conversion. It is interesting that the linear to non-linear conversion formulas in BT.2087-0 apply to both BT.709 and BT.2020. |
@sekrit-twc I checked the results of setting scene referred to 0 and to 1 when converting BT.709 video to BT.2020 with SMPTE-2084 (HDR10). The output matched my expectations. I have a 385MB video clip transcoded to HEVC HDR10 with both settings split-screen to see the difference. If you can provide a hosting site, I can upload the video. |
@dzung-hoang I reviewed BT.2087. It seems that the case of exponent 2.0 corresponds to the "scene referred" mode in my PR. I intentionally did not expose that option, because from what I can tell, display-referred conversion is almost always what the user wants, since "direct camera output" is essentially never found, not even in cameras. Users may wish to adjust the gamma de/encoding process, but a complete solution requires more than a single coefficient. One possible solution is a user-provided function or LUT. |
Does it also support adjustible display contrast? |
Well, the question we have to be asking ourselves is what the purpose of the conversion is. In document BT.2087-0, the point is that they need to linearize and de-linearize content for the purposes of converting colorspaces, not display. If you're linearizing content for something like scaling or colorspace transformations, then this is completely legitimate, and the only thing you have to be careful about is that you apply the exact same inverse function on the other end (which they do in this document). This is what mpv does as part of the playback processing as well, for example (for linear-light scaling and color management). But when you're linearizing content for the purposes of display, you have to consider the display's properties, which involves stuff like BT.1886's display-dependent contrast parameter. The basic problem here stems from the fact that LCD screens have a nonzero black point, which means that a naive, colorimetrically faithful mapping would result in lots of missing detail (black crush) because your display would be unable of displaying low-light scenes properly. So in practice, we have to transform our display's output contrast function in some way, to compensate for the nonzero black point. The “naive” way of doing so - which is basically what you get when you calibrate your display to a pure power 2.2 curve, or to sRGB, or whatever - is to vertically “squish” the function. (If you imagine the input value on the x axis, and the measured physical output luminance on the y axis, you'd take a pure power curve or sRGB curve or whatever and scale the y axis to make the function intersect 0 at the display's black level) BT.1886 on the other hand is shifts the output response along the gamma curve itself, which is a bit like stretching the curve along the x axis (until it intersects the black level at x=0). The question boils down to how you adjust for a difference in the black level, by squishing the y axis or stretching the x axis. So the question is always, what do you want to do - are you transferring inside a “pure” colorspace (where 0 means 0), or are you trying to transfer to a display's color space (where 0 might mean 0.1 cd/m²); and if so, do you want to do this before the HDR->output mapping or not, and so on. It gets more complicated when you work in stuff like the overall end-to-end transform (i.e. light level going into the camera and light level coming out of the display), since you want to try and preserve this as closely/linearly as possible while also being limited by the physical capabilities of your display. Either way, my tl;dr is that you need “multiple” implementations of BT.1886 based on what exactlty you're trying to do (convert to the display's output space? transcode a media file? simulate a different colorspace? etc.) |
@haasn, thanks for your input on this issue. Could you comment on how you would handle the details you mentioned at the API level? From my perspective, the goal of colorspace conversions in z.lib is to convert between the "ideal" form of each colorspace, which for BT.1886 is a display of infinite contrast. Display realities should be handled separately, in a video editor or at the playback level. Regarding the end-to-end gamma compression (e.g. 709 -> 1886), ITU-R BT.2100 appears to define a framework for handling this, by separating the gamma process into OOTF1 -> OETF -> EOTF -> OOTF2, which I modeled in my PR. With that said, I do understand that, particularly with the new HDR standards, the display details are a fundamental part of the transfer function. In particular, scene-specific metadata for ST.2084 is still unsupported. The issue of HDR tone mapping has been raised previously ( #60 ), and in my opinion is out of the scope of this issue. Currently, I have no plans to implement it, as I am unaware of any generally accepted practices/methods for tone mapping. This connects with my view of transfer function handling for colorspace conversion, in that z.lib should implement objective "ideal" operations, rather than subjective "creative" processes. |
Depends on the scope, target audience, and goal. I don't know the zimg codebase, so I can't really give you concrete advice. But I could offer a general idea:
For example, a HLG clip might have reference white = 120 cd/m², black = 0 cd/m², peak signal = 400 cd/m² and peak value range = 1440 cd/m². A PQ clip might have reference white = 200 cd/m², black = 0 cd/m², peak signal = 2000 cd/m² and peak value range = 10000 cd/m². An SDR display might have reference white = 80 cd/m², black = 0.1 cd/m², peak signal = 80 cd/m² and peak value range = 80 cd/m². A HDR display might have reference white = 120 cd/m², black = 0.001 cd/m², peak signal = 1500 cd/m² and value range = 10000 cd/m². And depending on your use case, you could even consider adding more metadata. (scene-referred or camera-referred? any implicit gamma boost?) Then, I would figure out the right conversion I want based on matching all of those parameters together. i.e. if the black points don't match, you need to perform black point compensation/scaling. If the source peak signal level is above the target's peak display level, you need to perform tone mapping, and so on. This is basically what we do in mpv, to some degree. (We don't encode all of those parameters at all points, but it would be possible in principle). This is a relatively flexible representation that allows for both absolute-scaled stuff like PQ and “reference”-scaled stuff like HLG (or SDR/BT.1886 for that matter).
This leads into what you said in
If this is your only concern, and you don't plan on changing this, then you could get away with stuff like just hard-coding the assumption that BT.1886 is gamma 2.4 (because mathematically, for an ideal display, it would be - and especially when you need to linearize for stuff like color space conversions, you're working with mathematically ideal RGB matrixes etc. either way), and also ignoring stuff like black point compensation for HDR conversion.
Depends on how reusable you want the internal primitives in zimg to be.
Pretty much, although note that the OOTF1 / OOTF2 distinction more or less stems solely from the fact that HLG and PQ decided to do the OOTF differently. (In HLG, it's part of the EOTF, because as I understand it this is important for preserving compatibility with SDR systems which is part of HDR's goal. In PQ, they did away with the OOTF nonsense by making it conceptually part of the OETF, meaning the encoded signal should be displayed as-is instead of reinterpreted). You wouldn't use the OOTF twice in two different places for the a single conversion, it's just that for e.g. SDR -> PQ you need to apply the OOTF (which is, in practice, done implicitly by just decoding the SDR signal using BT.1886 instead of BT.709), whereas for SDR -> HLG you don't.
ST.2084 metadata is more of a property of the image colorspace than a property of the display, and it should be handled as such. Basically, ST.2084 metadata was created to address a very serious shortcoming of ST.2084: Since ST.2084 (PQ) maps to an absolute brightness scale, independent of any display-referred reference points, you have no “anchors” for what the image contents are actually going to contain. Like obviously it's probably not going to contain relevant detail at 10,000 cd/m², but the playback system can't necessarily make that assumption - because in principle, it could. So when tone mapping from PQ to any real-world display, you would have to operate under the assumption that there could be detail all the way up to 10,000 cd/m². The ST.2084 metadata basically serves as a way to defining the “missing” struct fields from my list of colorspace parameters above, namely the “peak signal level” and “black level” (which are conceptually tagged as the white/black level of the mastering display, but the mastering display is obviously the reference for the signal levels). This has no real effect other than making the job of any tone mapping algorithm significantly easier, since tone mapping from a range of 0 - 1000 cd/m² is much easier than tone mapping from a range of 0 - 10000 cd/m². That's why the metadata is important, it improves the end result in any actual video playback system.
HDR tone mapping is jsut another problem in disguise: Gamut mapping has the exact same issue. How do you tone map from BT.2020 to BT.709? It's just as subjective. The easiest solution, of course, is to hard-clip all out-of-gamut colors. (Perhaps using a colorimetrically nearest clipping algorithm instead of a naive clipping algorithm). You can do the same for out-of-range HDR values as well. Of course, both gamut clipping and gamma clipping look terrible in practice. So what do you end up doing? Short of doing nothing and clipping, I think the sanest solution is to simply make it part of the API (which is what mpv does, as an enum of tone mapping types - some of which are also parametrized by some arbitrary float parameter, although this is not really necessary). There are some really simple tone mapping algorithms though, for example using a linear map up to some fixed value (like 0.9) and curving off into some logarithmic thing (up to your signal peak). Up to you. I mean at the end of the day, we consider gamut (color space) and gamma (display response) to be very separate beasts, but those are both just different abstractions on top of what in reality is a three dimensional space of representable colors. (Our CIE Yxy diagrams are essentially just like a top-down view) So in the end, you will always have some 3D volumetric color space representing the image contents, and some 3D volumetric color space representing the set of colors your display can display, and you some how have to squish the former into the latter (or hard-clip, which is objective but universally terrible) |
I implement neither tone mapping nor gamut mapping. Out of range values either end up clipped or as negative/super-positive colors, depending on output format. This is certainly an area that needs improvement, but is a discussion for another day. As I mentioned in my previous comment, I only want to implement generally accepted practices. The way I see it, this is no different to selecting which resampling filters I provide. ImageMagick has many "bespoke" resizing methods invented by the authors, which end up stuck in API/documentation, despite having no real defensible basis in theory (or visual results). If you know of well-accepted methods for the tone/gamut problem in the literature, I would certainly be open to adding them. In my opinion, any acceptable algorithm needs to be local instead of global, yet still temporally stable. For example, I do not like the idea of knee curves/soft-clipping, as they cause the image to change each time they are applied, i.e.. HDR -> SDR -> HDR -> SDR produces a different result from a single HDR -> SDR mapping. The methods in MPV appear to be derived from the Video Game industry, and have this issue. |
Those are all very reasonable points. If you want zimg to purely be a conversion library, then it would probably be best to stick to outputting |
Regarding:
I'd argue that there is little reason for people to utilize such a processing chain, but even if odd circumstances cause something like this to happen, the alternative to throw away all information that cannot be precisely retained in the first conversion isn't really better. Yes, once that out-of-destination-bounds-information is gone, further conversions cannot do additional harm, but only because all the harm to the picture quality was done in the first conversion. Regarding well-accepted methods one thing we can find in today's reality is that different vendors have opted for different mapping methods, e.g. Samsung and Panasonic players produce different results when mapping HDR->SDR, and while Samsung uses one fixed mapping (which a majority of reviewers seemed to not like as much as the Panasonic mapping), Panasonic provides a slider in their user interface to control the use of more or less of the standard dynamic range for the very dark or the very bright areas. |
@lvml: Do you happen to have any knowledge about the actual tone mapping algorithm Panasonic implements? |
@haasn: No, Panasonic AFAIK hasn't published what exactly they do. Samsung representatives promised at NAB 2016 that they would publish their tone mapping proposal, but I haven't seen a result of this, yet. So far, section 9.2 of this SMPTE document is the closest related text I have found on the subject matter. |
Besides the player-hardware-manufacturers, Youtube is probably one of the bigger companies who need to do HDR -> SDR conversion, their article on this matter contains a section "SDR auto-conversion issues" which seems to tell me they haven't found a perfect method yet, either ;-) |
@lvml From what I can gather from the documents you linked, tone and gamut mapping are not conversions, but rather creative processes. As noted in the SMPTE report, a soft-knee type of method degrades overall scene colours, even if most of the scene is not HDR. If tone or gamut mapping is to be added to z.lib, the selected method must be adaptive, not only to preserve the round-trip property, but also for acceptable visual appearance. @dzung-hoang If you have no further comment on my PR, I will merge it and consider this issue resolved. |
BT.709 specifies the OETF but does not specify the corresponding EOTF. It is a mistake to use the inverse of the OETF for display or display processing. BT.709 is intended for CRT displays. Recommendation ITU-R BT.1886 provides a reference EOTF that models the characteristics of CRT displays.
I have implemented the code changes required and can see much better results converting BT.709 content to HDR10 (BT.2020 and SMPTE-ST-2084). My implementation supports adjustable display gamma, which is recommended to be 2.40 according to BT.1886. I can share my code changes and further assist in the implementation.
The text was updated successfully, but these errors were encountered: