The check should be >= 0, not > 0. The check itself is redundant
since uninit only being called after init is success.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
B0 is defined by system header, see f0f596dbc6 for ref.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
src is apparently not guaranteed to be >8 byte aligned, but align to 16
nonetheless as the x86 asm will do unaligned loads anyway.
dst is guaranteed to be 32 byte aligned for the Y plane, but 16 byte for UV.
Signed-off-by: James Almer <jamrial@gmail.com>
The MMXEXT versions of the rgb2rgb functions tested here
always emit emms on their own. Therefore one can use
a stricter test to ensure that it stays that way.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The OS may silently fix (emulate) unaligned hardware access exceptions.
This is extremely slow and code should be fixed not to rely on unaligned
access on affected hardware. Accordingly this requests that the OS
disable emulation and instead throw Bus error, which will be caught by
checkasm's signal handler.
This has no effects if the hardware supports unaligned access in
hardware, since no exceptions are generated. prctl() will fail safe in
that case.
The line width 8 is supposed to test corner case, while the
performance doesn't matter. Width 1080 is also a case of
unaligned to 16.
Width 1920 meant for benchmark (together with --runs options).
Signed-off-by: James Almer <jamrial@gmail.com>
From Benjamin Bross:
> for ALF where functions are in increments of 4 while 8 should be sufficient according to the spec.
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
According to the VVC specification (section 8.5.1), the maximum width/height of a subblock passed for DMVR SAD is 16. This along with previous constraint requiring width * height >= 128 means that 8x16, 16x8, and 16x16 are the only allowed sizes.
This changes check_vvc_sad() to only test and benchmark those sizes.
VVC does not have MMX code at all, so one can use the stricter
declare_func to also check that the MMX state has not been clobbered
with (which would be an ABI violation).
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The loop filters can write before the pointer given to them;
the actual test invocations correctly used an offset, while
the benchmark calls were lacking an offset. Therefore, when
running with benchmarking, these tests could have spurious
failures.
Signed-off-by: Martin Storsjö <martin@martin.st>
Some timers on certain device and test combinations can produce noisy
results, affecting the reliability of performance measurements. One
notable example of this is the Canaan K230 RISC-V development board.
An option to adjust the number of samples by an exponent (--runs) has
been added, allowing developers to increase the sample count for more
reliable results.
Signed-off-by: J. Dekker <jdek@itanimul.li>
Don't benchmark every single combination of widths and heights;
only benchmark cases which are squares (like in vvc_mc.c).
Contrary to vvc_mc, which increases sizes by doubling dimensions,
vvc_alf tests all sizes in increments of 4. Limit benchmarking to
the cases which are powers of two.
This reduces the number of benchmarked cases from 3072 down to 18.
Fixes "signed integer overflow: [varies] * 104858 cannot be represented in type 'int'" errors
under ubsan.
Signed-off-by: James Almer <jamrial@gmail.com>
The only multiplicators used in scalarproduct_and_madd_*
are -1, 0 and +1. Yet it is of type int and the checkasm
test uses the complete range of int for it, leading to overflows
that don't happen for actual users.
Fix this by using a more reasonable range for mul: Given
that it is used in v1[i] += v3[i] * mul with v1 being
a 16bit integer, it makes no sense to use values for mul
that don't fit into 16bit.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>