The nvidia hardware explicitly supports decoding monochrome content,
presumably for the AVIF alpha channel. Supporting this requires an
adjustment in av1dec and explicit monochrome detection in nvdec.
I'm not sure why the monochrome path in av1dec did what it did - it
seems non-functional - YUV440P doesn't seem a logical pix_fmt for
monochrome and conditioning on chroma sub-sampling doesn't make sense.
So I changed it.
I've tested 8bit content, but I haven't found a way to create a 10bit
sample, so that path is untested for now.
With the introduction of HEVC 444 support, we technically have two
codecs that can handle 444 - HEVC and MJPEG. In the case of MJPEG,
it can decode, but can only output one of the semi-planar formats.
That means we need additional logic to decide whether to use a
444 output format or not.
The latest generation video decoder on the Turing chips supports
decoding HEVC 4:4:4. Supporting this is relatively straight-forward;
we need to account for the different chroma format and pick the
right output and sw formats at the right times.
There was one bug which was the hard-coded assumption that the
first chroma plane would be half-height; I fixed this to use the
actual shift value on the plane.
We also need to pass the SPS and PPS range extension flags.
We have a pattern of wrapping CUDA calls to print errors and
normalise return values that is used in a couple of places. To
avoid duplication and increase consistency, let's put the wrapper
implementation in a shared place and use it everywhere.
Affects:
* avcodec/cuviddec
* avcodec/nvdec
* avcodec/nvenc
* avfilter/vf_scale_cuda
* avfilter/vf_scale_npp
* avfilter/vf_thumbnail_cuda
* avfilter/vf_transpose_npp
* avfilter/vf_yadif_cuda
Replaces the data pointers with the mapped cuvid ones.
Adds buffer_refs to the frame to ensure the needed contexts stay alive
and the cuvid idx stays allocated.
Adds another buffer_ref to unmap the frame when it's unreferenced itself.
Copied the check from cuviddec.c (*_cuvid decoders) to allow the
capability check to be optional for older drivers.
Signed-off-by: Jacob Trimble <modmaker@google.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
nvdec will not produce odd width/height output, and while this is
basically never an issue with most codecs, due to internal alignment
requirements, you can get odd sized jpegs.
If an odd-sized jpeg is encountered, nvdec will actually round down
internally and produce output that is slightly smaller. This isn't
the end of the world, as long as you know the output size doesn't
match the original image resolution.
However, with an hwaccel, we don't know. The decoder controls
the reported output size and the hwaccel cannot change it. I was
able to trigger an error in mpv where it tries to copy the output
surface as part of rendering and triggers a cuda error because
cuda knows the output frame is smaller than expected.
To fix this, we can round up the configured width/height passed
to nvdec so that the frames are always at least as large as the
decoder's reported size, and data can be copied out safely.
In this particular jpeg case, you end up with a blank (green) line
at the bottom due to nvdec refusing to decode the last line, but
the behaviour matches cuviddec, so it's as good as you're going to
get.
This was predictably nightmarish, given how ridiculous mpeg4 is.
I had to stare at the cuvid parser output for a long time to work
out what each field was supposed to be, and even then, I still don't
fully understand some of them. Particularly:
vop_coded: If I'm reading the decoder correctly, this flag will always
be 1 as the decoder will not pass the hwaccel any frame
where it is not 1.
divx_flags: There's obviously no documentation on what the possible
flags are. I simply observed that this is '0' for a
normal bitstream and '5' for packed b-frames.
gmc_enabled: I had a number of guesses as to what this mapped to.
I picked the condition I did based on when the cuvid
parser was setting flag.
Also note that as with the vdpau hwaccel, the decoder needs to
consume the entire frame and not the slice.
The 'simple' hwaccels (not h.264 and hevc) all use the same bitstream
management and reference lookup logic so let's refactor all that into
common functions.
I verified that casting a signed int -1 to unsigned char produces 255
according to the C language specification.
This is mostly straight-forward. The weird part is that it should
just work for mpeg1, but I see corruption in my test cases, so I'm
going to try and fix that separately.
Some parts of the code are based on a patch by
Timo Rothenpieler <timo@rothenpieler.org>
Merges Libav commit b9129ec466.
Due to the name clash with our cuvid decoder, rename it to nvdec.
This commit also changes the Libav code to dynamic loading of the
cuda/cuvid libraries.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>