This implements the limited DM metadata compression scheme described in
chapter 9 of the dolby vision bitstream specification.
The spec is a bit unclear about how to handle the presence of static
metadata inside compressed frames; in that it doesn't explicitly forbid
an encoder from repeating redundant metadata. In theory, we would need
to detect this case and then strip the corresponding duplicate metadata
from the existing set of static metadata. However, this is difficult to
implement - esspecially for the case of metadata blocks which may be
internally repeated (e.g. level 10).
That said, the spec states outright that static metadata should be
constant throughout the entire sequence, so a sane bitstream should not
have any static metadata values changing from one frame to the next (at
least up to a keyframe boundary), and therefore they should never be
present in compressed frames. As a consequence, it makes sense to treat
this as an error state regardless. (Ignoring them by default, or
erroring if either AV_EF_EXPLODE or AV_EF_AGGRESSIVE are set)
I was not able to find such samples in the wild (outside of artificially
produced test cases for this exact scenario), so I don't think we need
to worry about it until somebody produces one.
ff_dovi_rpu_parse() and ff_dovi_rpu_generate() are a bit inconsistent in
that they expect different levels of encapsulation, due to the nature of
how this is handled in the context of different APIs. Clarify the status
quo. (And fix an incorrect reference to the RPU payload bytes as 'RBSP')
now that we are reading ext_mapping_idc as the upper 8 bits of
el_bit_depth_minus8 we need to use get_ue_golomb_long rather than
get_ue_golomb_31 for reading it
These two fields are coded together into a single 16 bit integer with upper 8
bits for ext_mapping_idc and lower 8 bits for el_bit_depth_minus8.
Furthermore ext_mapping_idc has two components, upper 3 bits and lower 5 bits.
Co-authored-by: Niklas Haas <git@haasn.dev>
Signed-off-by: Niklas Haas <git@haasn.dev>
This is used by future versions of the spec to implement metadata
compression. Given that we don't yet implement that spec, validate that
this is equal to 0 for now.
Despite the suggestive size limits, this metadata ID has nothing to do
with the VDR metadata ID used for the data mappings. Actually, the
specification leaves them wholly unexplained, other than acknowleding
their existence. Must be some secret dolby sauce. They're not even
involved in DM metadata compression, which is handled using an entirely
separate ID.
That leaves us with a lack of anything sensible to do with these IDs.
Since we unfortunately only expose one `dm_metadata_id` field to the
user, just ensure that they match; which appears to always be the case
in practice. If somebody ever hits this error, I would really much
rather like to see the triggering file.
When this is 0, the metadata is explicitly inferred to stated default
values from the spec, rather than inferred from the previous frame's
values.
Likewise, when encoding, instead of checking if the value changed since
the last frame, we need to check if it differs from the default.
According to the spec, missing previous VDR RPU IDs do not constitute an
error, but we should instead fallback first to VDR RPU with ID 0, and
failing that, synthesize "neutral" metadata.
That's nontrivial though as the resulting metadata will be dependent on
other properties of the RPU, and this case is not hit in practice so
I'll defer it to a rainy day.
The code as written was wrong. In the spec, these fields are treated
merely as plain integers in the range 0 to 4095. The only difference
between L2 and L8 is that L2.ms_weight also accepts an additional value
of -1, hence the extra sign bit. While it's likely that these are still
shifted integers in disguise, since all real-world samples seem to use
a value of 2048 here, the offset used in the code was wrong.
In addition, because the l8.ms_weight struct member is unsigned, these
wrong shifting semantics ended up overflowing the field, leading to
undefined behavior when transcoding. Fortunately, the damage was
relatively contained in practice, because it just corrupts the coding of
this field, which is ignored in practice in all implementations I have
seen.
To allow compiling the decoding objects without the encoding objects and
vice versa. Common helpers that users of both APIs need are put into the
shared dovi_rpu.c.
To allow internally re-using it for both the encoder and decoder.
This is based on HEVC only, H.264/AV1 use their own (hopefully correctly
signalled) profiles (and in particular, the AV1 decoders implicitly
default the correct profile in the absence of a configuration record).
The code was written under the misguided assumption that these fields
would only be present when the value changes, however this does not
match the actual patent specification, which says that streams are
required to either always signal this metadata, or never signal it.
That said, the specification does not really clarify what the defaults
of these fields should be in the event that this metadata is missing, so
without any sample file or other reference I don't wish to hazard
a guess at how to interpret these fields.
Fix the current behavior by making sure we always throw this error, even
for files that have the vdr sequence info in one frame but are missing
it in the next frame.
And make it public.
For encoding, users may also be interested in the configured level and
compatibility ID. So generalize the dv_profile field and just expose the
whole configuration record.
This makes the already rather reductive ff_dovi_update_cfg() function
almost wholly redundant, since users can just directly assign
DOVIContext.cfg.
We split the inner loop between v1 and v2 extension blocks to print
a warning where an extension block was encountered in an unexpected
context.
Co-authored-by: quietvoid <tcChlisop0@gmail.com>
The Dolby Vision RPU contains a CRC32 to validate the payload against.
The implementation is CRC32/MPEG-2.
The CRC is only verified with the AV_EF_CRCCHECK flag.
Co-authored-by: quietvoid <tcChlisop0@gmail.com>
This ensures that `gb` in the following section is fully byte-aligned,
points at the start of the actual RPU, and ends on the CRC terminator.
This is important for both calculation of the CRC, as well as dovi
extension block parsing (which aligns to byte boundaries in various
places).
The NLQ pivots are not documented but should be present in the header
for profile 7 RPU format. It has been verified using Dolby's
verification toolkit.
Signed-off-by: quietvoid <tcChlisop0@gmail.com>
Signed-off-by: Niklas Haas <git@haasn.dev>
There are lots of files that don't need it: The number of object
files that actually need it went down from 2011 to 884 here.
Keep it for external users in order to not cause breakages.
Also improve the other headers a bit while just at it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: shift exponent 32 is too large for 32-bit type 'int'
Fixes: 63151/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-5067531154751488
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
It avoids allocations and the corresponding error checks.
Also avoids casts and indirections.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Based on a mixture of guesswork, partial documentation in patents, and
reverse engineering of real-world samples. Confirmed working for all the
samples I've thrown at it.
Contains some annoying machinery to persist these values in between
frames, which is needed in theory even though I've never actually seen a
sample that relies on it in practice. May or may not work.
Since the distinction matters greatly for parsing the color matrix
values, this includes a small helper function to guess the right profile
from the RPU itself in case the user has forgotten to forward the dovi
configuration record to the decoder. (Which in practice, only ffmpeg.c
and ffplay do..)
Notable omissions / deviations:
- CRC32 verification. This is based on the MPEG2 CRC32 type, which is
similar to IEEE CRC32 but apparently different in subtle enough ways
that I could not get it to pass verification no matter what parameters
I fed to av_crc. It's possible the code needs some changes.
- Linear interpolation support. Nothing documents this (beyond its
existence) and no samples use it, so impossible to implement.
- All of the extension metadata blocks, but these contain values that
seem largely congruent with ST2094, HDR10, or other existing forms of
side data, so I will defer parsing/attaching them to a future commit.
- The patent describes a mechanism for predicting coefficients from
previous RPUs, but the bit for the flag whether to use the
prediction deltas or signal entirely new coefficients does not seem to
be present in actual RPUs, so we ignore this subsystem entirely.
- In the patent's spec, the NLQ subsystem also loops over
num_nlq_pivots, but even in the patent the number is hard-coded to one
iteration rather than signalled. So we only store one set of coefs.
Heavily influenced by https://github.com/quietvoid/dovi_tool
Documentation drawn from US Patent 10,701,399 B2 and ETSI GS CCM 001
Signed-off-by: Niklas Haas <git@haasn.dev>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>