Simply taking the Zbb REV8 instruction into use in a simple loop gives
some significant savings:
bswap_buf_c: 1081.0
bswap_buf_rvb_b: 771.0
But we can also use the 64-bit REV8 as a pseudo-SIMD instruction with
just one additional shift, and one fewer load, effectively doubling the
bandwidth. Consequently, this patch is useful even if the compile-time
target has Zbb enabled for C code:
bswap_buf_c: 1081.0
bswap_buf_rvb_b: 341.0 (this patch)
On the other hand, this approach fails miserably for bswap16_buf as the
ratio of shifts and stores becomes unfavorable compared to naïve C:
bswap16_buf_c: 1542.0
bswap16_buf_rvb_b: 1803.7
Unrolling to process 128 bits (4 samples) at a time actually worsens
performance ever so slightly:
bswap_buf_c: 1081.0
bswap_buf_rvb_b: 408.5
Unfortunately, it is common, and will remain so, that the Bit
manipulations are not enabled at compilation time. This is an official
policy for Debian ports in general (though they do not support RISC-V
officially as of yet) to stick to the minimal target baseline, which
does not include the B extension or even its Zbb subset.
For inline helpers (CPOP, REV8), compiler builtins (CTZ, CLZ) or
even plain C code (MIN, MAX, MINU, MAXU), run-time detection seems
impractical. But at least it can work for the byte-swap DSP functions.
To avoid data dependencies, this does the following unroll, which
requires one extra but probably free addition:
coeff = (b * left_weight) >> decorr_shift;
b += a;
a -= coeff;
b -= coeff;
swap(a, b);
We never guard against a user freeing/stealing the private context;
and returning AVERROR(EIO) is inappropriate.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Move ROUND_MUL* macros to their only users and the Celt
macros to opus_celt.h. Also improve the other headers
a bit while at it.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Simply don't include opus_pvq.h in opus_celt.h: The latter only
uses pointers to CeltPVQ.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
opus.h (which is used by all the Opus code) currently includes
several structures only used by the parser and the decoder;
several elements of OpusContext are even only used by the decoder.
This commit therefore moves the part of OpusContext that is shared
between these two components (and used by ff_opus_parse_extradata())
out into a new structure and moves all the other accompanying
structures and functions to a new header, opus_parse.h; the
functions itself are also moved to a new file, opus_parse.c.
(This also allows to remove several spurious dependencies
of the Opus parser and encoder.)
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: out of array access
Fixes: 51648/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TRUEHD_fuzzer-4644322217164800
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
More than 2 channels seems unsupported, the code seems to just output empty extra channels
Fixes: Timeout
Fixes: 51569/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SPEEX_fuzzer-5511509165342720
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 119760682 - -2084600173 cannot be represented in type 'int'
Fixes: 50993/clusterfuzz-testcase-minimized-ffmpeg_dem_VIVIDAS_fuzzer-6745781167587328
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Otherwise p->linesize[0] * y will be evaluated as an unsigned
which leads to segfaults in case linesize is negative.
This happens in the apng-dispose-previous FATE-test in case
one makes get_buffer return pictures with negative linesizes.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This might happen in avio_write() if size == 0
when the direct codepath is taken. It is undefined behaviour
according to the spec although it happens to work in practice.
Fixes the webm-webvtt-remux FATE-test under UBSan.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
partial_corr is an int16_t and so the av_clipl_int32()
never clips and can be removed. This also avoids
undefined left-shifts of negative numbers.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Return immediately if not enough leftover bits are available
when flushing. This is simpler and also avoids an
init_get_bits(gb, NULL, 0) (which currently leads to NULL + 0,
which is UB; this affects the lossless-wma(|-1|-2|-rawtile)
FATE tests).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Happens when flushing. This triggers NULL + 0 (which is UB) in
init_get_bits_xe (which previously errored out, but the return value
has not been checked) and in copy_bits().
This fixes the wmavoice-(7|11|19)k FATE-tests with UBSan.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
src2 is used in CAVS_SUBPIX_HV iff FULL is true (it is exactly
for the egpr functions); otherwise it might be NULL. So check
for FULL before doing pointer arithmetic.
Fixes a "src/libavcodec/cavsdsp.c:524:1: runtime error: applying
non-zero offset 8 to null pointer" from UBSan.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
As long as ff_mpeg12_common_init() existed in mpeg12.c,
it added a dependency of mpeg12.o on mpegvideodata.o
(which provides ff_mpeg2_dc_scale_table, which is used
in ff_mpeg12_common_init()). mpegvideodata.o is normally
provided by the mpegvideo subsystem and therefore several
codecs and the MPEG-1/2 parser added a configure dependency
on said subsystem (additionally, the eatqi decoder just
added a Makefile dependency on mpegvideodata.o).
Given that ff_mpeg12_common_init() is no more, these dependencies
can be removed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It only sets [yc]_dc_scale_table and these tables are only
read in ff_set_qscale(); but the MPEG-1/2 decoders don't call
ff_set_qscale() at all.
(Furthermore, given that intra_dc_precision is always zero
for a decoder at this point, ff_mpeg12_common_init()
actually set these pointers to what ff_mpv_common_defaults()
already set them.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is better place for these declarations than
mpeg12data.h as RL VLC are just a variant of VLCs.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Starting with an h264 implementation. Can be extended to support other codecs.
A few caveats:
- OpenGOP streams are currently not supported. The firt packet must be an IDR
frame.
- In some streams, a few frames at the end may not get a reordered PTS when
they reference frames past EOS. The code added to derive timestamps from
previous frames needs to extended.
Addresses ticket #502.
Signed-off-by: James Almer <jamrial@gmail.com>
Provide optimized implementation for vsse_intra8 for arm64.
Performance tests are shown below.
- vsse_5_c: 87.7
- vsse_5_neon: 26.2
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation of vsse8 for arm64.
Performance comparison tests are shown below.
- vsse_1_c: 141.5
- vsse_1_neon: 32.5
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Add vectorized implementation of nsse8 function.
Performance comparison tests are shown below.
- nsse_1_c: 256.0
- nsse_1_neon: 82.7
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation of pix_abs8 function for arm64.
Performance comparison tests are shown below:
pix_abs_1_1_c: 162.5
pix_abs_1_1_neon: 27.0
pix_abs_1_2_c: 174.0
pix_abs_1_2_neon: 23.5
pix_abs_1_3_c: 203.2
pix_abs_1_3_neon: 34.7
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Grzegorz Bernacki <gjb@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Apparently this option was intended (the context contains a
currently-unused frame_rate field), but was never added. This results in
the output timebase being unset after config_output(), so the input
audio timebase ends up being used for video output, which is clearly
wrong.
Add an option for setting output video framerate. Also set output frame
durations.