add ARM code for implementing av_clip_intp2 using the ssat instruction
on Cortex-A8, av_clip_intp2_arm() is faster than av_clip_intp2_c() and
the generic av_clip(), about -19%
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
there already is a function, av_clip_uintp2() that clips a signed integer
to an unsigned power-of-two range, i.e. 0,2^p-1
this patch adds a function av_clip_intp2() that clips a signed integer
to a signed power-of-two range, i.e. -(2^p),(2^p-1)
the new function can be used as a special case for av_clip(), e.g.
av_clip(x, -8192, 8191) can be rewritten as av_clip_intp2(x, 13)
there are ARM instructions, usat and ssat resp., which map nicely to these
functions (see next patch)
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
SSE2 instructions that are XMM-implementations of pre-existing MMX/MMX2
instructions did not issue warnings when used in SSE functions. Handle
it by also checking the register type when such instructions are used.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Doing this check in avutil_version() is not appropriate. Also, this code
is by default disabled (--assert-level is by default 0). A FATE run with
defaults will never execute the checks.
Move it to the pixelutils test program. Whatever reason there was in
avutil_version() not to run this test by default, it should be fine in
this test program. This means FATE will run the test by default. (Yes,
pixelutils is not strictly the best place for it either, but it's
better.)
(pixdesc.c also has a small test program, but it's never run by FATE.)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Before the changes :
lavu CAMELLIA size: 1048576 runs: 1024 time: 32.541 +- 0.044
After the changes:
lavu CAMELLIA size: 1048576 runs: 1024 time: 24.589 +- 0.066
Tested with crypto_bench on a Linux x86_64 OS with Intel Core i5-3210M CPU.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Found-by: wm4
Reviewed-by: wm4 <nfxjfg@googlemail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Carl Eugen Hoyos <cehoyos@ag.or.at>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This mimicks what is done for the other instruction sets.
Tested-by: James Almer <jamrial@gmail.com>
Tested-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This uses explicit memory copying to read and write pointer to pointers
of arbitrary object types. This works provided that the architecture
uses the same representation for all pointer types (the previous code
made that assumption already anyway).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
This avoids a potential conflict with the equally named function from XOPEN
It also could reduce confusion in debugger backtraces
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Move the lavc/imgconvert functions and rename them as follows:
avpicture_get_size -> av_image_get_buffer_size()
avpicture_fill -> av_image_fill_arrays()
avpicture_layout -> av_image_copy_to_buffer()
The new functions have an align parameter, which allows to define the
linesize alignment assumed in the buffer (which is set or read).
The names of the functions are consistent with the lavu/samples API
(av_samples_get_buffer_size(), av_samples_fill_arrays()).
A redundant check has been dropped from av_image_fill_arrays().
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>