layered_dpb only makes sense when dedicated_dpb is set to 1.
For some mysterious reason, some Nvidia drivers stopped indicating
SEPARATE_REFRENCES, but kept the COINCIDE flag, which broke
the code.
The issue is that shaderc_result_get_num_errors may sometime
return 0 even when shaderc_result_get_compilation_status returns
a non-zero error code.
Since we use the result from the former, override the status
if it returned 0.
This commit was long overdue. The old transfer dubiously tried to
merge as much code as possible, and had very little in the way
of optimizations, apart from basic host-mapping.
The new code uses buffer pools for any temporary bufflers, and
handles falling back to buffer-based uploads if host-mapping fails.
Roundtrip performance difference:
ffmpeg -init_hw_device "vulkan=vk:0,debug=0,disable_multiplane=1" -f lavfi \
-i color=red:s=3840x2160 -vf hwupload,hwdownload,format=yuv420p -f null -
7900XTX:
Before: 224fps
After: 502fps
Ada, with proprietary drivers:
Before: 29fps
After: 54fps
Alder Lake:
Before: 85fps
After: 108fps
With the host-mapping codepath disabled:
Before: 32fps
After: 51fps
The issue with the old mechanism is that we had to introduce new
API each time we needed a new queue family, and all the queue families
were functionally fixed to a given purpose.
Nvidia's GPUs are able to handle video encoding and compute on the
same queue, which results in a speedup when pre-processing is required.
Also, this enables us to expose optical flow queues for frame interpolation.
This matches production code which also zeros these buffers
Fixes: use of uninitialized values
Fixes: 70885/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VP6F_fuzzer-4610946029387776 (and likely others)
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Vendor id will help to select desired device in case of kernel driver is
unknow or unsupported, for vendor may support different kernel driver on
different platforms.
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Fixes: Use of uninitialized value
Fixes: 70900/clusterfuzz-testcase-minimized-ffmpeg_dem_WTV_fuzzer-6286909377150976
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The code can leave uninitialized holes in the array.
Fixes: use of uninitialized values
Fixes: 70883/clusterfuzz-testcase-minimized-ffmpeg_dem_WTV_fuzzer-6698694567591936
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
In some scenarios nb_tracks isn't the same as nb_streams, so a given id may end
up being used for two separate streams.
e.g. when muxing an IAMF track followed by a video track, if the IAMF track
consists of several streams, the video track would end up having an id of 2,
which may also be used by one of the IAMF streams.
Signed-off-by: James Almer <jamrial@gmail.com>
This puts lavu frame buffer allocator helpers in sync with lavc's decoder frame
buffer allocator's STRIDE_ALIGN define.
Remove the comment about av_cpu_max_align() while at it as using it is not
ideal when CPU flags can be changed mid process.
Should fix ticket #11116.
Signed-off-by: James Almer <jamrial@gmail.com>
This fixes passing options dict.
Fixes some timeouts found by OSS-Fuzz.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes runtime error: member access within misaligned address
<addr> for type 'av_alias64', which requires 8 byte alignment.
VP9mv is aligned to 4 bytes, so instead doing 8 bytes clear, let's do
2 times 4 bytes.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: Use-of-uninitialized value
Fixes: clusterfuzz-testcase-minimized-ffmpeg_BSF_H264_METADATA_fuzzer-5458626041413632
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes use of uninitialized value, reported by MSAN.
Found by OSS-Fuzz.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Fixes: 70852/clusterfuzz-testcase-minimized-ffmpeg_IO_DEMUXER_fuzzer-5179190066872320
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes use of uninitialized value, reported by MSAN.
Found by OSS-Fuzz.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Fixes: 70837/clusterfuzz-testcase-minimized-ffmpeg_dem_JPEGXL_ANIM_fuzzer-5089407768526848
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
There is no known (real) hardware with V and without the complete B
extension. B was indeed required in the RISC-V application profile from
2022, earlier than V. There should not be any relevant hardware in the
future either.
In practice, different R-V Vector optimisations in FFmpeg already depend on
every constituent of the B extension anyhow, so it would not work well.
These are really just wrappers for idct4_add16intra functions, which are in
turn mostly wrappers for idct4_add and idct4_dc_add functions.
For benchmarks refer to the later two sets.