When input surfaces are cuda frames, we will not know what the actual
underlying format (nv12, p010, etc) is at surface allocation time.
On the other hand, we will know when the input frames are actually
registered and associated with a surface.
So, let's delay format discovery until registration time, which is
actually how we handle other frame properties, such as dimensions.
By itself, this change doesn't allow for transcoding of 10bit
content from cuvid, but it reduces the problem to the hardcoding of
the sw format in ffmpeg_cuvid.c
Signed-off-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
This dubious behaviour in nvenc was finally removed by nvidia, and
as we refuse to run on anything older than 7.0, we don't need to
keep it around for old versions.
User selectable surfaces are not working correctly, if you set number of
surfaces on cmdline, it will always use minimum 32 or 48 depends on
selected resolution, but in nvenc it is not necessary to use so many
surfaces.
So from now you can define as low as 1 surface and nvenc will still
work, it will ofcourse lower GPU memory usage by 95% and async_delay to zero
That was the easy part, now littlebit more...
Next part of this patch is to always prefer rc_lookahead to be more
important for number of surfaces, than user defined surfaces value.
Maximum rc_lookahead from nvidia documentation is 32, but could increase
in future generations so there is no limit for this yet. Value
async_depth is still accepted and prefered over rc_lookahead.
There were also bug when you request more than rc_lookahead > 31, it
will always set maximum 31, because surface numbers recalculation was
after setting lookahead, which is now fixed.
Results:
If you set -rc_lookahead 32 and -bf 3 it will now use only 40 surfaces
and lower GPU memory usage by 20%, also it will now increase PSNR by 0.012dB
Two more comments:
1. from my internal test, i don't understand addition of 4 more surfaces
when lookahead is calculated, i didn't used this and everything works as
with those 4 more extra surfaces, does anybody know what is going on
there? I looks like it was used for B frames which are calculated
separately, because B frames maximum is 4.
2. rc_lookahead is defined default to -1, but in test condition if
(ctx->rc_lookahead) which sets lookahead it will be always true, i don't
know if this is intended behavior, so in default behavior is lookahead
always on!
This is default condition when rc_lokkahead is -1 (not defined on
cmdline), whis is maybe something that is not intended:
ctx->encode_config.rcParams.enableLookahead = 1;
ctx->encode_config.rcParams.lookaheadDepth = 0;
ctx->encode_config.rcParams.disableIadapt = 0;
ctx->encode_config.rcParams.disableBadapt = 0;
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
nvenc still encodes as yuv, but does the conversion internally which
brings some performance gains.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
Use explicit nvenc capability checks instead to determine usable devices
instead of SM versions.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
There is no point in separate structures as they have 1:1 relationship,
they are always used together and they have same lifetime.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
Functions names and scopes are from libav. This commit basically moves
code around without changing it; it shouldn't change any functionality
except some small leak fixes on error paths.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
For reasons we are not privy to, nvidia decided that the nvenc encoder
should apply aspect ratio compensation to 'DVD like' content, assuming that
the content is not BT.601 compliant, but needs to be BT.601 compliant. In
this context, that means that they make the following, questionable,
assumptions:
1) If the input dimensions are 720x480 or 720x576, assume the content has
an active area of 704x480 or 704x576.
2) Assume that whatever the input sample aspect ratio is, it does not account
for the difference between 'physical' and 'active' dimensions.
From these assumptions, they then conclude that they can 'help', by adjusting
the sample aspect ratio by a factor of 45/44. And indeed, if you wanted to
display only the 704 wide active area with the same aspect ratio as the full
720 wide image - this would be the correct adjustment factor, but what if you
don't? And more importantly, what if you're used to lavc not making this kind
of adjustment at encode time - because none of the other encoders do this!
And, what if you had already accounted for BT.601 and your input had the
correct attributes? Well, it's going to apply the compensation anyway!
So, if you take some content, and feed it through nvenc repeatedly, it
will keep scaling the aspect ratio every time, stretching your video out
more and more and more.
So, clearly, regardless of whether you want to apply bt.601 aspect ratio
adjustments or not, this is not the way to do it. With any other lavc
encoder, you would do it as part of defining your input parameters or do
the adjustment at playback time, and there's no reason by nvenc should
be any different.
This change adds some logic to undo the compensation that nvenc would
otherwise do.
nvidia engineers have told us that they will work to make this
compensation mechanism optional in a future release of the nvenc
SDK. At that point, we can adapt accordingly.
Signed-off-by: Philip Langdale <philipl@overt.org>
Reviewed-by: Timo Rothenpieler <timo@rothenpieler.org>
Signed-off-by: Anton Khirnov <anton@khirnov.net>