Use two separate functions, depending on whether VFP/NEON is available.
This is set to require armv5te - it uses blx, which is only available
since armv5t, but we don't have a separate configure item for that.
(It also uses ldrd, which requires armv5te, but this could be avoided
if necessary.)
Signed-off-by: Martin Storsjö <martin@martin.st>
The vector mode was deprecated in ARMv7-A/VFPv3 and various cpu
implementations do not support it in hardware. Vector mode code will
depending the OS either be emulated in software or result in an illegal
instruction on cpus which does not support it. This was not really
problem in practice since NEON implementations of the same functions are
preferred. It will however become a problem for checkasm which tests
every cpu flag separately.
Since this is a cpu feature newer cpu do not support anymore the
behaviour of this flag differs from the other flags. It can be only
activated by runtime cpu feature selection.
Tested functions are internally kept in a binary search tree for efficient
lookups. The downside of the current implementation is that the tree quickly
becomes unbalanced which causes an unneccessary amount of comparisons between
nodes. Improve this by changing the tree into a self-balancing left-leaning
red-black tree with a worst case lookup/insertion time complexity of O(log n).
Significantly reduces the recursion depth and makes the tests run around 10%
faster overall. The relative performance improvement compared to the existing
non-balanced tree will also most likely increase as more tests are added.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Tested functions are internally kept in a binary search tree for efficient
lookups. The downside of the current implementation is that the tree quickly
becomes unbalanced which causes an unneccessary amount of comparisons between
nodes. Improve this by changing the tree into a self-balancing left-leaning
red-black tree with a worst case lookup/insertion time complexity of O(log n).
Significantly reduces the recursion depth and makes the tests run around 10%
faster overall. The relative performance improvement compared to the existing
non-balanced tree will also most likely increase as more tests are added.
Now we no longer have to rely on function pointers intentionally
declared without specified argument types.
This makes it easier to support functions with floating point parameters
or return values as well as functions returning 64-bit values on 32-bit
architectures. It also avoids having to explicitly cast strides to
ptrdiff_t for example.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Now we no longer have to rely on function pointers intentionally
declared without specified argument types.
This makes it easier to support functions with floating point parameters
or return values as well as functions returning 64-bit values on 32-bit
architectures. It also avoids having to explicitly cast strides to
ptrdiff_t for example.
configure does check for isatty, and checkasm properly checks
HAVE_ISATTY, but on some platforms (e.g. WinRT), io.h needs to be
included for isatty to be available.
Signed-off-by: Martin Storsjö <martin@martin.st>
It provides the following features:
* verify correctness by comparing output to the C version.
* detect failure to save and restore clobbered callee-saved registers.
* detect 32-bit parameters being used as if they were 64-bit in x86-64
(the upper halves are not guaranteed to be zero - but in practice
they very often are, which makes those bugs hard to spot otherwise).
* easy benchmarking.
Compile by running 'make checkasm'.
Execute by running 'tests/checkasm/checkasm'.
Optional arguments are '--bench' to run benchmarks for all functions,
'--bench=<pattern>' to run benchmarks for all functions that starts with
<pattern>, and '<integer>' to seed the PRNG for reproducible results.
Contains unit tests for most h264pred functions to get started, more tests
can be added afterwards using those as a reference.
Loosely based on code from x264. Currently only supports x86 and x86-64,
but additional architectures shouldn't be too much of an obstacle to add.
Note that functions with floating point parameters or floating point
return values are not supported. Some compiler-specific features or
preprocessor hacks would likely be required to add support for that.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>