Cuvid supports clips with a limit on maximum number of macroblocks.
This check was missing after cuvidGetDecoderCaps API call allowing
unsupported clips to proceed.
Added the missing check, same as the one in hwaccel nvdec implementation.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
There are 2 types of problems when using adaptive deinterlace with cuvid:
1. Sometimes, in the middle of transcoding, cuvid outputs frames with visible horizontal lines (as though weave deinterlace method was chosen);
2. Occasionally, on scene changes, cuvid outputs a wrong frame, which should have been shown several seconds before (as if the frame was assigned some wrong PTS value).
The reason is that sometimes CUVIDPARSERDISPINFO has property progressive_frame equal to 1 with interlaced videos.
In order to fix the problem we should check if the video is interlaced or progressive in the beginning of a video sequence (cuvid_handle_video_sequence).
And then we just use this information instead of the property progressive_frame in CUVIDPARSERDISPINFO (which is unreliable).
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
We're also doing a sync here after copying the frame to be passed
on down the pipleine. And it is also unnecessary.
I was able to demonstrate a 33% speedup removing the sync from
an example transcode pipeline.
This is the equivalent change for cuviddec after the previous change
for nvdec. I made similar changes to the copying routines to handle
pixel formats in a more generic way.
Note that unlike with nvdec, there is no confusion about the ability
of a codec to output 444 formats. This is because the cuvid parser is
used, meaning that 444 JPEG content is still indicated as using a 420
output format.
We have a pattern of wrapping CUDA calls to print errors and
normalise return values that is used in a couple of places. To
avoid duplication and increase consistency, let's put the wrapper
implementation in a shared place and use it everywhere.
Affects:
* avcodec/cuviddec
* avcodec/nvdec
* avcodec/nvenc
* avfilter/vf_scale_cuda
* avfilter/vf_scale_npp
* avfilter/vf_thumbnail_cuda
* avfilter/vf_transpose_npp
* avfilter/vf_yadif_cuda
Explicitly identify decoder/encoder wrappers with a common name. This
saves API users from guessing by the name suffix. For example, they
don't have to guess that "h264_qsv" is the h264 QSV implementation, and
instead they can just check the AVCodec .codec and .wrapper_name fields.
Explicitly mark AVCodec entries that are hardware decoders or most
likely hardware decoders with new AV_CODEC_CAPs. The purpose is allowing
API users listing hardware decoders in a more generic way. The proposed
AVCodecHWConfig does not provide this information fully, because it's
concerned with decoder configuration, not information about the fact
whether the hardware is used or not.
AV_CODEC_CAP_HYBRID exists specifically for QSV, which can have software
implementations in case the hardware is not capable.
Based on a patch by Philip Langdale <philipl@overt.org>.
Merges Libav commit 47687a2f8a.
This removes the dependency that hardware pixel formats previously had on
AVHWAccel instances, meaning only those which actually do something need
exist after this patch.
Also updates avcodec_default_get_format() to be able to choose hardware
formats if either a matching device has been supplied or no additional
external configuration is required, and avcodec_get_hw_frames_parameters()
to use the hardware config rather than searching the old hwaccel list.
The FF_CODEC_CAP_HWACCEL_REQUIRE_CLASS mechanism is deleted because it
no longer does anything (the codec already contains the pointers to the
matching hwaccels).
This includes a pointer to the associated hwaccel for decoders using
hwaccels - these will be used later to implement the hwaccel setup
without needing a global list.
Also added is a new file listing all hwaccels as external declarations -
this will be used later to generate the hwaccel list at configure time.
Currently, AVHWAccels are looked up using a (codec_id, pixfmt) tuple.
This means it's impossible to have 2 decoders for the same codec and
using the same opaque hardware pixel format.
This breaks merging Libav's CUVID hwaccel. FFmpeg has its own CUVID
support, but it's a full stream decoder, using NVIDIA's codec parser.
The Libav one is a true hwaccel, which is based on the builtin software
decoders.
Fix this by introducing another field to disambiguate AVHWAccels, and
use it for our CUVID decoders. FF_CODEC_CAP_HWACCEL_REQUIRE_CLASS makes
this mechanism backwards compatible and optional.
cuvid.c is used by Libav's CUVID hwaccel. Resolve the conflict and
avoid future merge problems by renaming our decoder.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
This is a newer API that is intended for decoders like the cuvid
wrapper. Until now, the wrapper required to set an awkward
"incomplete" hw_frames_ctx to set the device. Now the device
can be set directly, and the user can get AV_PIX_FMT_CUDA output
for a specific device simply by setting hw_device_ctx.
This still does a dummy ff_get_format() call at init time, and should
be fully backward compatible.
If there is progressive input it will disable deinterlacing in cuvid for
all future frames even those interlaced.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
CUVID on GeForce GT 730 and GeForce GTX 1060 does not report any error when
decoding 8K h264 packets. However, it does return an error during
cuvidCreateDecoder call if the indicated video resolution is not
supported.
Given that stream resolution is typically known as a result of probing
it is better to use this information during avcodec_open2 call to fail
immediately, rather than proceeding to decode and never receiving any
frames from the decoder nor receiving any indication of decode failure.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
I moved this into the handle_video_sequence callback because that's
the earliest time you can make an accurate decision as to what the
format should be.
However, transcoding requires that the decision between using
the accelerated PIX_FMT_CUDA vs a normal pix format happen at init()
time. There is enough information available to make that decision
and things work out with the underlying format only being discovered
in the sequence callback.
The nvidia 375.xx driver introduces support for P016 output surfaces,
for 10bit and 12bit HEVC content (it's also the first driver to support
hardware decoding of 12bit content).
The cuvid api, as far as I can tell, only declares one output format
that they appear to refer to as P016 in the driver strings. Of course,
10bit content in P016 is identical to P010, and it is useful for
compatibility purposes to declare the format to be P010 to work with
other components that only know how to consume P010 (and to avoid
triggering swscale conversions that are lossy when they shouldn't be).
For simplicity, this change does not maintain the previous ability
to output dithered NV12 for 10/12 bit input video - the user will need
to update their driver to decode such videos.
Although cuvid can only output 8bit, it can consume HEVC Main10 if
the bit depth is set properly. In cases where >8bit is not supported,
this change is still beneficial as the decoder will fail to be
created instead of plowing throw and decoding as 8bit.
We need to remove the dynlink fanciness and replace it with normal
function prototypes and update the include paths and configure logic.
We don't need to explicitly check for PICPARMS now - they're going
to be there.
Currently does not work with the ffmpeg cli tool, due do it using the
old one in one out API.
Anything using the new API, like mpv, can make use of it, provided it is
prepared for a decoder modifying the framerate and outputing multiple
frames per input. FFmpeg itself is not.
Despite the video parser seeming to correctly handle 422 and 444
chroma formats, the video decoder fails miserably to actually
decode frames - even though no errors are ever returned; you just
get frames showing unintialized garbage.
Signed-off-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
I'm not really sure how this worked at all before, but we do need to
reinitalize the parser with the stream extradata.
Signed-off-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
The cuvid parser is basically undocumented, and although you'd
think that a failed callback would result in the overall parse
call returning an error, that is not true.
So, we end up silently trying to keep going as if nothing is wrong,
which doesn't achieve anything.
Solution: check the internal error flag every time.
Signed-off-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
Right now, if we attempt to use cuvid in a media player and then
try to seek, the decoder will happily pass out whatever frames were
already in flight before the seek.
There is both the output queue in our code and some number of frames
within the cuvid decoder that need to be accounted for.
cuvid doesn't support flush, so our only choice is to do a brute-force
re-creation of the decoder, which also implies re-creating the parser,
but this is fine.
The only subtlty is that there is sanity check code in decoder
initialisation that wants to make sure the HWContextFrame hasn't already
been initialised. This is a fair check to do at the beginning but not
after a flush, so it has to be made conditional.
Signed-off-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>