nvdec will not produce odd width/height output, and while this is
basically never an issue with most codecs, due to internal alignment
requirements, you can get odd sized jpegs.
If an odd-sized jpeg is encountered, nvdec will actually round down
internally and produce output that is slightly smaller. This isn't
the end of the world, as long as you know the output size doesn't
match the original image resolution.
However, with an hwaccel, we don't know. The decoder controls
the reported output size and the hwaccel cannot change it. I was
able to trigger an error in mpv where it tries to copy the output
surface as part of rendering and triggers a cuda error because
cuda knows the output frame is smaller than expected.
To fix this, we can round up the configured width/height passed
to nvdec so that the frames are always at least as large as the
decoder's reported size, and data can be copied out safely.
In this particular jpeg case, you end up with a blank (green) line
at the bottom due to nvdec refusing to decode the last line, but
the behaviour matches cuviddec, so it's as good as you're going to
get.
This was predictably nightmarish, given how ridiculous mpeg4 is.
I had to stare at the cuvid parser output for a long time to work
out what each field was supposed to be, and even then, I still don't
fully understand some of them. Particularly:
vop_coded: If I'm reading the decoder correctly, this flag will always
be 1 as the decoder will not pass the hwaccel any frame
where it is not 1.
divx_flags: There's obviously no documentation on what the possible
flags are. I simply observed that this is '0' for a
normal bitstream and '5' for packed b-frames.
gmc_enabled: I had a number of guesses as to what this mapped to.
I picked the condition I did based on when the cuvid
parser was setting flag.
Also note that as with the vdpau hwaccel, the decoder needs to
consume the entire frame and not the slice.
The 'simple' hwaccels (not h.264 and hevc) all use the same bitstream
management and reference lookup logic so let's refactor all that into
common functions.
I verified that casting a signed int -1 to unsigned char produces 255
according to the C language specification.
This is mostly straight-forward. The weird part is that it should
just work for mpeg1, but I see corruption in my test cases, so I'm
going to try and fix that separately.
Some parts of the code are based on a patch by
Timo Rothenpieler <timo@rothenpieler.org>
Merges Libav commit b9129ec466.
Due to the name clash with our cuvid decoder, rename it to nvdec.
This commit also changes the Libav code to dynamic loading of the
cuda/cuvid libraries.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>