For example, with
./ffmpeg -operating_rate 400 -hwaccel mediacodec -i test.mp4 -an \
-c:v h264_mediacodec -operating_rate 400 -b:v 5M -f null -
The transcoding speed is 254 FPS.
Without -operating_rate on dec/enc, the speed is 148 FPS.
With -operating_rate on decoder only, the speed is 239 FPS.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The codec wants to know whether the usecase is realtime playback
or full-speed transcoding, or playback at a higher speed. The codec
runs faster when operating_rate higher than framerate.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Unlike the software FFv1 encoder, none of our buffers are allocated by
FFmpeg, which supports at most 4GiB large allocations.
For really large sizes, the maximum size of the buffer can exceed 4GiB,
which the software encoder optimistically tries to allocate as 4GiB
in the hopes that the encoder will compress to under that amount.
We can just let Vulkan allocate us a larger buffer, and switch to
64-bit offsets.
As of 459a1512f1,
the code is unrolled to process two rows per iteration.
The output cursor thus needs to be incremented by twice the
stride, which is taken care of with SH1ADD. However the original
ADD from the original implemetation was incorrectly left over.
This commit implements a standard, compliant, version 3 and version 4
FFv1 encoder, entirely in Vulkan. The encoder is written in standard
GLSL and requires a Vulkan 1.3 supporting GPU with the BDA extension.
The encoder can use any amount of slices, but nominally, should use
32x32 slices (1024 in total) to maximize parallelism.
All features are supported, as well as all pixel formats.
This includes:
- Rice
- Range coding with a custom quantization table
- PCM encoding
CRC calculation is also massively parallelized on the GPU.
Encoding of unaligned dimensions on subsampled data requires
version 4, or requires oversizing the image to 64-pixel alignment
and cropping out the padding via container flags.
Performance-wise, this makes 1080p real-time screen capture possible
at 60fps on even modest GPUs.
After the branch, the expected SEW/LMUL ratio is 1 byte/vector.
So we have to set the same ratio before branching (QEMU does not care,
but real hardware does).
PIXFUNC macro is unused since d29a9c2aa6.
Signed-off-by: Kyosuke Kawakami <kawakami150708@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The add_dirac_obmc8_mmx function was the only MMX function left. This
patch migrates it to SSE2.
Here are the checkasm benchmark results:
diracdsp.add_dirac_obmc_8_c: 2299.1 ( 1.00x)
diracdsp.add_dirac_obmc_8_mmx: 237.6 ( 9.68x)
diracdsp.add_dirac_obmc_8_sse2: 109.1 (21.07x)
Signed-off-by: Kyosuke Kawakami <kawakami150708@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The JPEG XL parser has an entropy decoder inside, which supports LZ77
length-distance pairs. If the first symbol from the entropy stream is an
LZ77 pair, the bitstream is invalid, so we should abort immediately rather
than attempt to read it anyway (which would read from the uninitialized
starting window).
Reported-by: Kacper Michajłow <kasper93@gmail.com>
Found-by: ossfuzz
Fixes: 368725676/clusterfuzz-testcase-minimized-fuzzer_protocol_file-6022251122589696-cut
Fixes: 42537758/clusterfuzz-testcase-minimized-fuzzer_protocol_file-5818969469026304-cut
Signed-off-by: Leo Izen <leo.izen@gmail.com>
And ensure the buffer is synced between threads.
Based on a patch by Dale Curtis <dalecurtis@chromium.org>
Signed-off-by: James Almer <jamrial@gmail.com>
This removes the ABI breaking use of sizeof(AVFilmGrainParams), and achieves the
same size reduction to decoder structs as 08b1bffa49.
Signed-off-by: James Almer <jamrial@gmail.com>
AVFilmGrainAFGS1Params, the offending struct, is using sizeof(AVFilmGrainParams)
when it should not. This change also forgot to make the necessary changes to the
frame threading sync code.
Both of these will be fixed by the following commit.
H274FilmGrainDatabase will be handled later.
This reverts commit 08b1bffa49.
Signed-off-by: James Almer <jamrial@gmail.com>
This commit introduced a regression to VVC_HDR_UHDTV1_OpenGOP_3840x2160_50fps_HLG10_mosaic.ts.
Root Cause:
The AV_CEIL_RSHIFT(a, b) macro uses bit tricks that work only when -a is a negative value.
However, due to integer promotion rules, this behavior does not extend to the unsigned int type.
See "6.3.1.1 Boolean, characters, and integers" in the "ISO/IEC 9899" for details.
Reported-by: Frank Plowman <post@frankplowman.com>