simple_idct.asm is 32 bit-only since
bfb28b5ce8,
whereas simple_idct10.asm is x64-only. So don't build
the ultimately unneeded and empty files, as some linkers
complain about this: "ranlib: file:
libavcodec/libavcodec.a(simple_idct.o) has no symbols"
(this is from an Xcode toolchain as reported by Ronald S. Bultje).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is only used by gmc/gmc1 which is only used by the MPEG-4
decoder, so move it to Mpeg4DecContext and rename it
to Mpeg4VideoDSP. Also compile it iff the MPEG-4 decoder is compiled.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Old one was written with the assumption only even inputs would be given.
This very messy replacement supports even and odd inputs, and supports
AVX2 for extra speed. The buffers given are usually quite big (4k samples),
so the speedup is worth it.
The new SSE version is still faster than the old inline asm version by 33%.
Also checkasm is provided to make sure this monstrosity works.
This fixes some FATE tests.
Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>,
extended by James Almer <jamrial@gmail.com>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
Performance improvements:
quant_bands:
with: 681 decicycles in quant_bands, 8388453 runs, 155 skips
without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips
Around 42% for the function
Twoloop coder:
abs_pow34:
with/without: 7.82s/8.17s
Around 4% for the entire encoder
Both:
with/without: 7.15s/8.17s
Around 12% for the entire encoder
Fast coder:
abs_pow34:
with/without: 3.40s/3.77s
Around 10% for the entire encoder
Both:
with/without: 3.02s/3.77s
Around 20% faster for the entire encoder
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: James Almer <jamrial@gmail.com>
Adds a wrapper function for downmixing which detects channel count changes
and updates the selected downmix function accordingly.
Simplification and porting to current x86inc infrastructure by Diego Biurrun.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Originally written by Pierre Edouard Lepere <pierre-edouard.lepere@insa-rennes.fr>.
Integrated to Libav by Josh de Kock <josh@itanimul.li>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Modeled from the prores version. Clips to [0;1023] and is bitexact.
Bitexactness requires to add offsets in different places compared to
prores or C, and makes the function approximately 2% slower.
For 16 frames of a DNxHD 4:2:2 10bits test sequence:
C: 60861 decicycles in idct, 1048205 runs, 371 skips
sse2: 27567 decicycles in idct, 1048216 runs, 360 skips
avx: 26272 decicycles in idct, 1048171 runs, 405 skips
The add version is not implemented, so the corresponding dsp
function is set to NULL to make it clear in a code executing it.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>