When collecting performance information from checkasm it is common
to parse the output for use in graphs to compare vs different
architectures.
Signed-off-by: J. Dekker <jdek@itanimul.li>
F and D extensions are included in all RISC-V application profiles ever
made (so starting from RV64GC a.k.a. RVA20). Realistically they need to be
selected at compilation time.
Currently, there are no consumers for these two flags. If there is ever a
need to reintroduce F- or D-specific optimisations, we can always use
__riscv_f or __riscv_d compiler predefined macros respectively.
This preserves T1 whilst calling the instrumented function. In a Sci-Fi
setting where type-based Control Flow Integrity (CFI) is supported, the
calling code (i.e., the `checkasm` test case) will set T1 to the expected
value of the landing pad label (LPL) of the instrumented function.
The call wrapper will always use LPL zero which is a wild card. We should
preserve the value of T1 at least until the indirect call to the
instrumented function. Of course this is Sci-Fi, because:
1) there is no hardware (or even QEMU) support yet,
2) all our assembler functions currently use LPL zero anyway.
This uses T3 rather than T2 because indirect branches with T2 is reserved
for notionally direct calls made with an indirect call instruction (e.g.
due to GOT indirection), and are exempted from forward-edge CFI checks.
The B extension was finally ratified in May 2024, encompassing:
- Zba (addresses),
- Zbb (basics) and
- Zbs (single bits).
It does not include Zbc (base-2 polynomials).
Increase the tolerance from 10 ulp to 11 ulp. This fixes occasional
errors for some inputs; the errors could be reproduced on
aarch64/neon builds, with "checkasm --test=ac3dsp 3446175925".
Signed-off-by: Martin Storsjö <martin@martin.st>
Not every function will be set, so zero the context
to initialize everything.
This also allows to remove an initialization in dvenc.c.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
MECmpContext.ildct_cmp is an array of function pointers that
are not set by ff_me_cmp_init(), but that are set by users
to one of the other arrays via ff_set_cmp().
Remove these pointers from MECmpContext and add pointers
for the actually used functions to its users. (The DV encoder
already did so.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
MECmpContext has several arrays of function pointers that
are not set by ff_me_cmp_init(), but that are set by users
to one of the other arrays via ff_set_cmp().
One of these other users is mpegvideo_enc; it is the only user
of MECmpContext.frame_skip_cmp and it only uses one of these
function pointers at all.
This commit therefore moves this function pointer to MpegEncContext;
and removes the array from MECmpContext.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
MECmpContext has several arrays of function pointers that
are not set by ff_me_cmp_init(), but that are set by users
to one of the other arrays via ff_set_cmp().
One of these other users is the motion estimation API.
It uses MECmpContext.(me_pre|me|me_sub|mb)_cmp. It is
basically the only user of these arrays.
This commit therefore moves these arrays to MotionEstContext;
this has the additional advantage of making motion_est.c
more independent from MpegEncContext.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The check should be >= 0, not > 0. The check itself is redundant
since uninit only being called after init is success.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
B0 is defined by system header, see f0f596dbc6 for ref.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
src is apparently not guaranteed to be >8 byte aligned, but align to 16
nonetheless as the x86 asm will do unaligned loads anyway.
dst is guaranteed to be 32 byte aligned for the Y plane, but 16 byte for UV.
Signed-off-by: James Almer <jamrial@gmail.com>
The MMXEXT versions of the rgb2rgb functions tested here
always emit emms on their own. Therefore one can use
a stricter test to ensure that it stays that way.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The OS may silently fix (emulate) unaligned hardware access exceptions.
This is extremely slow and code should be fixed not to rely on unaligned
access on affected hardware. Accordingly this requests that the OS
disable emulation and instead throw Bus error, which will be caught by
checkasm's signal handler.
This has no effects if the hardware supports unaligned access in
hardware, since no exceptions are generated. prctl() will fail safe in
that case.
The line width 8 is supposed to test corner case, while the
performance doesn't matter. Width 1080 is also a case of
unaligned to 16.
Width 1920 meant for benchmark (together with --runs options).
Signed-off-by: James Almer <jamrial@gmail.com>