Mark Reid
9e445a5be2
swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions
...
changes since v2:
* fixed label
changes since v1:
* remove vex intruction on sse4 path
* some load/pack marcos use less intructions
* fixed some typos
yuv2gbrp_full_X_4_512_c: 12757.6
yuv2gbrp_full_X_4_512_sse2: 8946.6
yuv2gbrp_full_X_4_512_sse4: 5138.6
yuv2gbrp_full_X_4_512_avx2: 3889.6
yuv2gbrap_full_X_4_512_c: 15368.6
yuv2gbrap_full_X_4_512_sse2: 11916.1
yuv2gbrap_full_X_4_512_sse4: 6294.6
yuv2gbrap_full_X_4_512_avx2: 3477.1
yuv2gbrp9be_full_X_4_512_c: 14381.6
yuv2gbrp9be_full_X_4_512_sse2: 9139.1
yuv2gbrp9be_full_X_4_512_sse4: 5150.1
yuv2gbrp9be_full_X_4_512_avx2: 2834.6
yuv2gbrp9le_full_X_4_512_c: 12990.1
yuv2gbrp9le_full_X_4_512_sse2: 9118.1
yuv2gbrp9le_full_X_4_512_sse4: 5132.1
yuv2gbrp9le_full_X_4_512_avx2: 2833.1
yuv2gbrp10be_full_X_4_512_c: 14401.6
yuv2gbrp10be_full_X_4_512_sse2: 9133.1
yuv2gbrp10be_full_X_4_512_sse4: 5126.1
yuv2gbrp10be_full_X_4_512_avx2: 2837.6
yuv2gbrp10le_full_X_4_512_c: 12718.1
yuv2gbrp10le_full_X_4_512_sse2: 9106.1
yuv2gbrp10le_full_X_4_512_sse4: 5120.1
yuv2gbrp10le_full_X_4_512_avx2: 2826.1
yuv2gbrap10be_full_X_4_512_c: 18535.6
yuv2gbrap10be_full_X_4_512_sse2: 33617.6
yuv2gbrap10be_full_X_4_512_sse4: 6264.1
yuv2gbrap10be_full_X_4_512_avx2: 3422.1
yuv2gbrap10le_full_X_4_512_c: 16724.1
yuv2gbrap10le_full_X_4_512_sse2: 11787.1
yuv2gbrap10le_full_X_4_512_sse4: 6282.1
yuv2gbrap10le_full_X_4_512_avx2: 3441.6
yuv2gbrp12be_full_X_4_512_c: 13723.6
yuv2gbrp12be_full_X_4_512_sse2: 9128.1
yuv2gbrp12be_full_X_4_512_sse4: 7997.6
yuv2gbrp12be_full_X_4_512_avx2: 2844.1
yuv2gbrp12le_full_X_4_512_c: 12257.1
yuv2gbrp12le_full_X_4_512_sse2: 9107.6
yuv2gbrp12le_full_X_4_512_sse4: 5142.6
yuv2gbrp12le_full_X_4_512_avx2: 2837.6
yuv2gbrap12be_full_X_4_512_c: 18511.1
yuv2gbrap12be_full_X_4_512_sse2: 12156.6
yuv2gbrap12be_full_X_4_512_sse4: 6251.1
yuv2gbrap12be_full_X_4_512_avx2: 3444.6
yuv2gbrap12le_full_X_4_512_c: 16687.1
yuv2gbrap12le_full_X_4_512_sse2: 11785.1
yuv2gbrap12le_full_X_4_512_sse4: 6243.6
yuv2gbrap12le_full_X_4_512_avx2: 3446.1
yuv2gbrp14be_full_X_4_512_c: 13690.6
yuv2gbrp14be_full_X_4_512_sse2: 9120.6
yuv2gbrp14be_full_X_4_512_sse4: 5138.1
yuv2gbrp14be_full_X_4_512_avx2: 2843.1
yuv2gbrp14le_full_X_4_512_c: 14995.6
yuv2gbrp14le_full_X_4_512_sse2: 9119.1
yuv2gbrp14le_full_X_4_512_sse4: 5126.1
yuv2gbrp14le_full_X_4_512_avx2: 2843.1
yuv2gbrp16be_full_X_4_512_c: 12367.1
yuv2gbrp16be_full_X_4_512_sse2: 8233.6
yuv2gbrp16be_full_X_4_512_sse4: 4820.1
yuv2gbrp16be_full_X_4_512_avx2: 2666.6
yuv2gbrp16le_full_X_4_512_c: 10904.1
yuv2gbrp16le_full_X_4_512_sse2: 8214.1
yuv2gbrp16le_full_X_4_512_sse4: 4824.1
yuv2gbrp16le_full_X_4_512_avx2: 2629.1
yuv2gbrap16be_full_X_4_512_c: 26569.6
yuv2gbrap16be_full_X_4_512_sse2: 10884.1
yuv2gbrap16be_full_X_4_512_sse4: 5488.1
yuv2gbrap16be_full_X_4_512_avx2: 3272.1
yuv2gbrap16le_full_X_4_512_c: 14010.1
yuv2gbrap16le_full_X_4_512_sse2: 10562.1
yuv2gbrap16le_full_X_4_512_sse4: 5463.6
yuv2gbrap16le_full_X_4_512_avx2: 3255.1
yuv2gbrpf32be_full_X_4_512_c: 14524.1
yuv2gbrpf32be_full_X_4_512_sse2: 8552.6
yuv2gbrpf32be_full_X_4_512_sse4: 4636.1
yuv2gbrpf32be_full_X_4_512_avx2: 2474.6
yuv2gbrpf32le_full_X_4_512_c: 13060.6
yuv2gbrpf32le_full_X_4_512_sse2: 9682.6
yuv2gbrpf32le_full_X_4_512_sse4: 4298.1
yuv2gbrpf32le_full_X_4_512_avx2: 2453.1
yuv2gbrapf32be_full_X_4_512_c: 18629.6
yuv2gbrapf32be_full_X_4_512_sse2: 11363.1
yuv2gbrapf32be_full_X_4_512_sse4: 15201.6
yuv2gbrapf32be_full_X_4_512_avx2: 3727.1
yuv2gbrapf32le_full_X_4_512_c: 16677.6
yuv2gbrapf32le_full_X_4_512_sse2: 10221.6
yuv2gbrapf32le_full_X_4_512_sse4: 5693.6
yuv2gbrapf32le_full_X_4_512_avx2: 3656.6
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
3 years ago
J. Dekker
b492cacffd
checkasm: collapse hevc pel tests
...
Also add to `make fate-checkasm' target.
Signed-off-by: J. Dekker <jdek@itanimul.li>
3 years ago
J. Dekker
9a727235fd
lavu/checkasm: add (private) kperf timing for macOS
...
Signed-off-by: J. Dekker <jdek@itanimul.li>
4 years ago
Lynne
1978b143eb
checkasm: add av_tx FFT SIMD testing code
...
This sadly required making changes to the code itself,
due to the same context needing to be reused for both versions.
The lookup table had to be duplicated for both versions.
4 years ago
Josh Dekker
9c513edb79
checkasm: add hevc_pel tests
...
Co-authored-by: Niklas Haas <git@haasn.xyz>
Signed-off-by: Josh Dekker <josh@itanimul.li>
4 years ago
Martin Storsjö
ed7d73355e
checkasm: aarch64: Check for stack overflows
...
Also fill x8-x17 with garbage before calling the function.
Figure out the number of stack parameters and make sure that the
value on the stack after those is untouched.
Signed-off-by: Martin Storsjö <martin@martin.st>
5 years ago
Martin Storsjö
6cb2d4d94b
checkasm: arm: Check for stack overflows
...
Figure out the number of stack parameters and make sure that the
value on the stack after those is untouched.
Signed-off-by: Martin Storsjö <martin@martin.st>
5 years ago
Josh de Kock
5913cd4e6c
checkasm: add hscale test
...
This tests the hscale 8bpp to 14/18bpp functions with different filter
sizes.
Signed-off-by: Josh de Kock <josh@itanimul.li>
5 years ago
Martin Storsjö
3ce1b2bf8d
checkasm: add function to check and diff memory
...
This was ported from dav1d (c950e7101bdf5f7117bfca816984a21e550509f0).
Signed-off-by: Josh de Kock <josh@itanimul.li>
5 years ago
Ting Fu
9691e2a426
checkasm/vf_eq: add test for vf_eq
...
Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
5 years ago
Lynne
4ce1e13b54
checkasm: add opusdsp tests
5 years ago
Ruiling Song
8f4963ad25
checkasm/vf_gblur: add test for horiz_slice simd
...
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
6 years ago
James Darnley
76c370af64
checkasm: add test for v210dec
6 years ago
James Almer
ba89dc27b5
checkasm: add an af_afir test
...
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
6 years ago
Clément Bœsch
f679711c1b
checkasm: add vf_nlmeans test for ssd_integral_image
7 years ago
Martin Vignali
a9a7ed4f27
checkasm/swscale : add test for rgb shuffle_bytes func
7 years ago
Yingming Fan
80798e3857
checkasm/hevc_sao : add hevc_sao for checkasm
...
Signed-off-by: James Almer <jamrial@gmail.com>
7 years ago
Martin Vignali
78b982d3b9
checkasm : add test for losslessvideoencdsp for diff bytes and sub_left_pred
7 years ago
James Almer
da03242778
Revert "checkasm/vf_interlace : add test for lowpass_line 8 and 16"
...
This reverts commit adff97be5e
.
It currently fails on Windows targets.
Signed-off-by: James Almer <jamrial@gmail.com>
7 years ago
Martin Vignali
adff97be5e
checkasm/vf_interlace : add test for lowpass_line 8 and 16
7 years ago
Martin Vignali
cefb7e0060
checkasm/vf_hflip : add test for vf_hflip byte and short simd
7 years ago
Martin Vignali
cfce442750
checkasm/vf_threshold : add checkasm test for threshold8
7 years ago
Martin Vignali
4a6aa6d1b2
checkasm : add test for huffyuvdsp add_int16
7 years ago
Martin Vignali
6a7eb65e1b
checkasm : add utvideodsp test
7 years ago
James Almer
7323c896b2
checkasm: add an exrdsp test
...
Signed-off-by: James Almer <jamrial@gmail.com>
7 years ago
Clément Bœsch
e0d56f097f
checkasm: use perf API on Linux ARM*
...
On ARM platforms, accessing the PMU registers requires special user
access permissions. Since there is no other way to get accurate timers,
the current implementation of timers in FFmpeg rely on these registers.
Unfortunately, enabling user access to these registers on Linux is not
trivial, and generally involve compiling a random and unreliable github
kernel module, or patching somehow your kernel.
Such module is very unlikely to reach the upstream anytime soon. Quoting
Robin Murphin from ARM:
> Say you do give userspace direct access to the PMU; now run two or more
> programs at once that believe they can use the counters for their own
> "minimal-overhead" profiling. Have fun interpreting those results...
>
> And that's not even getting into the implications of scheduling across
> different CPUs, CPUidle, etc. where the PMU state is completely beyond
> userspace's control. In general, the plan to provide userspace with
> something which might happen to just about work in a few corner cases,
> but is meaningless, misleading or downright broken in all others, is to
> never do so.
As a result, the alternative is to use the Performance Monitoring Linux
API which makes use of these registers internally (assuming the PMU of
your ARM board is supported in the kernel, which is definitely not a
given...).
While the Linux API is obviously cross platform, it does have a
significant overhead which needs to be taken into account. As a result,
that mode is only weakly enabled on ARM platforms exclusively.
Note on the non flexibility of the implementation: the timers (native
FFmpeg vs Linux API) are selected at compilation time to prevent the
need of function calls, which would result in a negative impact on the
cycle counters.
7 years ago
James Almer
823cc7e25f
checkasm: add a g722dsp test
...
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
Matthieu Bouron
7864e07f4a
checkasm: add sbrdsp tests
8 years ago
Clément Bœsch
edd041e64c
checkasm: add AAC PS tests
...
This includes various fixes and improvements from James Almer.
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
Diego Biurrun
fd502f4f5f
build: Generalize yasm/nasm-related variable names
...
None of them are specific to the YASM assembler.
(Cherry-picked from libav commit 39e208f4d4
)
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
James Almer
5b10f484e2
checkasm: add float_dsp tests
...
Ported from libavutil/tests/float_dsp.c
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
James Almer
37388b119c
checkasm: add a checkasm_checked_call function that doesn't issue emms
...
Meant for DSP functions returning a float or double, as they'd fail if emms
is called after every run on x86_32.
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
James Almer
7b3cb953f7
checkasm: add fixed_dsp tests
...
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
8 years ago
Diego Biurrun
39e208f4d4
build: Generalize yasm/nasm-related variable names
...
None of them are specific to the YASM assembler.
8 years ago
Alexandra Hájková
ed48a9d814
checkasm: Add a test for HEVC add_residual
8 years ago
Martin Storsjö
c91d6a33f8
checkasm: aarch64: Add filler args to make sure all parameters are passed on the stack
...
This, combined with clobbering the stack space prior to the call,
increases the chances of finding cases where 32 bit parameters
are erroneously treated as 64 bit.
Signed-off-by: Martin Storsjö <martin@martin.st>
8 years ago
Martin Storsjö
f1b3e13138
checkasm: aarch64: Clobber the stack before calling functions
...
Signed-off-by: Martin Storsjö <martin@martin.st>
8 years ago
Alexandra Hájková
22c3ab1864
checkasm: Add test for huffyuvdsp add_bytes
...
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
8 years ago
Anton Khirnov
e9ef617139
checkasm: add tests for audiodsp
8 years ago
Anton Khirnov
2eb97af66a
checkasm: add a test for blockdsp
8 years ago
Ronald S. Bultje
e99ecda550
checkasm: add vp9 MC tests.
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
9 years ago
Alexandra Hájková
9064777dbb
checkasm: add HEVC test for testing IDCT DC
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
9 years ago
Martin Storsjö
f8d17d5395
checkasm: Add tests for vp8dsp
...
The tests are inspired by similar tests for vp9 by
Ronald Bultje.
Signed-off-by: Martin Storsjö <martin@martin.st>
9 years ago
Martin Storsjö
dc7501e524
checkasm: Issue emms after benchmarking functions
...
The functions may not clean up properly after using MMX
registers. For the normal testing calls, the checkasm_checked_call
functions will do the cleanup (and check that functions that
should clean up do it as well), but when benchmarking functions
that don't clean up, we don't currently properly clean up at all.
This causes issues if a benchmarked function is followed by testing
of a function that is supposed to not clobber the MMX/FPU state but
doesn't touch it at all.
Signed-off-by: Martin Storsjö <martin@martin.st>
9 years ago
Martin Storsjö
105998fb5c
checkasm: Add tests for h264 idct
...
The tests are inspired by similar tests for vp9 by
Ronald Bultje.
Signed-off-by: Martin Storsjö <martin@martin.st>
9 years ago
Ronald S. Bultje
5ce703a6bf
vf_colorspace: x86-64 SIMD (SSE2) optimizations.
9 years ago
Diego Biurrun
7c82d31cbe
checkasm: Use standard multiple inclusion guards
9 years ago
Timothy Gu
a953a2991e
checkasm: Add vf_blend tests
9 years ago
Timothy Gu
180f9a0958
all: Make header guard names consistent
9 years ago
foo86
4608996772
avcodec/dca: remove old decoder
...
Remove all files and functions which are not going to be reused,
and disable all functions and FATE tests temporarily which will be.
9 years ago