FFmpeg

Commit Graph

Author	SHA1	Message	Date
Martin Vignali	b37196adff	avutil/x86util : add macro for loading a 128 bits constants in an xmm or in each part of an ymm in order to simplify avx2 asm func	7 years ago
Dale Curtis	50e30d9bb7	Don't use _tzcnt instrinics with clang for windows w/o BMI. Technically _tzcnt* intrinsics are only available when the BMI instruction set is present. However the instruction encoding degrades to "rep bsf" on older processors. Clang for Windows debatably restricts the _tzcnt* instrinics behind the __BMI__ architecture define, so check for its presence or exclude the usage of these intrinics when clang is present. See also: https://ffmpeg.org/pipermail/ffmpeg-devel/2015-November/183404.html https://bugs.llvm.org/show_bug.cgi?id=30506 http://lists.llvm.org/pipermail/cfe-dev/2016-October/051034.html Signed-off-by: Dale Curtis <dalecurtis@chromium.org> Reviewed-by: Matt Oliver <protogonoi@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	7 years ago
James Almer	3d828c9fd5	cpu: split flag checks per arch in av_cpu_max_align() Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	7 years ago
James Almer	3b345d389b	avutil/cpu: split flag checks per arch in av_cpu_max_align() Signed-off-by: James Almer <jamrial@gmail.com>	7 years ago
Ivan Kalvachev	30ae07d7ef	Add macros to x86util.asm . Improved version of VBROADCASTSS that works like the avx2 instruction. Emulation of vpbroadcastd. Horizontal sum HSUMPS that places the result in all elements. Emulation of blendvps and pblendvb. Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>	7 years ago
James Almer	4d62ee6746	x86inc: don't use read-only data sections on COFF targets Yasm: src/libavfilter/x86/af_volume.asm:24: warning: Standard COFF does not support read-only data sections src/libavfilter/x86/af_volume.asm:24: warning: Unrecognized qualifier `align' Nasm: src/libavfilter/x86/af_volume.asm:24: error: standard COFF does not support section alignment specification src/libavutil/x86/x86inc.asm:92: ... from macro `SECTION_RODATA' defined here Tested-by: Clément Bœsch <u@pkh.me> Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
Diego Biurrun	fd502f4f5f	build: Generalize yasm/nasm-related variable names None of them are specific to the YASM assembler. (Cherry-picked from libav commit `39e208f4d4`) Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	e229df9478	x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4} About 2x faster than the c version.	8 years ago
Henrik Gramner	aad1b6786e	x86inc: Add some additional cpuflag relations Simplifies writing assembly code that depends on available instructions. LZCNT implies SSE2 BMI1 implies AVX+LZCNT AVX2 implies BMI2	8 years ago
Anton Mitrofanov	d991b3e8a8	x86inc: Remove argument from WIN64_RESTORE_XMM The use of rsp was pretty much hardcoded there and probably didn't work otherwise with stack_size > 0.	8 years ago
Henrik Gramner	cd4ca82459	x86inc: Prefer r14/r15 over r12/r13 on x86-64 Due to a peculiarity in the ModR/M addressing encoding, the r12 and r13 registers sometimes requires an additional byte when used as a base register. r14 and r15 doesn't have that issue, so prefer using them.	8 years ago
Henrik Gramner	88dcdfad09	x86inc: Make REP_RET identical to RET in SSSE3+ functions There's no point in emitting a rep prefix before ret on modern CPUs.	8 years ago
Henrik Gramner	406e0ddc0b	x86inc: Fix call with memory operands We overload the `call` instruction with a macro, but it would misbehave when the macro argument wasn't a valid identifier. Fix it by explicitly checking if the argument is an identifier.	8 years ago
James Almer	0fbc7a2169	x86/float_dsp: remove usage of integer instructions	8 years ago
James Almer	f1d80bc630	x86/float_dsp: add ff_vector_fmul_reverse_avx2 ~20% faster than AVX. Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	ed9b25a148	x86/float_dsp: add ff_vector_dmac_scalar_{sse2,avx,fma3}	8 years ago
James Almer	d8962ffbd8	avutil/x86util: don't use movss in VBROADCASTSS macro when src and dst args are the same Reviewed-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
Diego Biurrun	994c4bc107	x86util: Port all macros to cpuflags Also do some small cosmetic changes: Drop pointless _MMX suffix from ABSD2 macro name, drop pointless check for MMX support, we always assume MMX is available in our SIMD code, fix spelling.	8 years ago
Diego Biurrun	39e208f4d4	build: Generalize yasm/nasm-related variable names None of them are specific to the YASM assembler.	8 years ago
James Darnley	5336887867	avcodec/h264: sse2, avx h luma mbaff deblock/loop filter x86-64 only Yorkfield: - sse2: ~2.17x (434 vs. 200 cycles) Nehalem: - sse2: ~2.94x (409 vs. 139 cycles) Skylake: - sse2: ~3.10x (370 vs. 119 cycles) - avx: ~3.29x (370 vs. 112 cycles)	8 years ago
James Darnley	7627df15d4	x86util: import MOVHL macro Originally committed to x264 in 1637239a by Henrik Gramner who has agreed to re-license it as LGPL. Original commit message follows. x86: Avoid some bypass delays and false dependencies A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning between int and float domains, so try to avoid that if possible.	8 years ago
James Darnley	9d815b7424	avcodec/x86: deduplicate PASS8ROWS macro	8 years ago
Diego Biurrun	7abdd026df	asm: Consistently uppercase SECTION markers	8 years ago
Henrik Gramner	cd09e3b349	x86inc: Avoid using eax/rax for storing the stack pointer When allocating stack space with an alignment requirement that is larger than the current stack alignment we need to store a copy of the original stack pointer in order to be able to restore it later. If we chose to use another register for this purpose we should not pick eax/rax since it can be overwritten as a return value.	8 years ago
Henrik Gramner	3cba1ad76d	x86inc: Avoid using eax/rax for storing the stack pointer When allocating stack space with an alignment requirement that is larger than the current stack alignment we need to store a copy of the original stack pointer in order to be able to restore it later. If we chose to use another register for this purpose we should not pick eax/rax since it can be overwritten as a return value. Signed-off-by: Anton Khirnov <anton@khirnov.net>	8 years ago
Diego Biurrun	99434f4df8	float_dsp: Have implementation match function pointer prototype libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 1 different from declaration libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 2 different from declaration	8 years ago
Michael Niedermayer	051517648b	avutil/x86/emms: Document the emms_c() vs alloc/free relation. Reviewed-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Diego Biurrun	7911186ed6	emms: Give apriv_emms_yasm() a more general name	8 years ago
Diego Biurrun	6be7944ee2	x86: Add missing colons after assembly labels This fixes many warnings of the sort warning: label alone on a line without a colon might be in error	8 years ago
Alexandra Hájková	07e1f99a1b	x86util: Document SBUTTERFLY macro Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	8 years ago
Anton Khirnov	d7bc52bf45	imgutils: add a function for copying image data from GPU mapped memory See https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers	8 years ago
Fiona Glaser	8e9cd81d29	x86: cpu: Detect Conroe CPUs and their slow shuffle unit	8 years ago
Diego Biurrun	7d7355aa92	x86: Add SSSE3_SLOW CPU flag and related convenience macros	8 years ago
James Almer	fd5e6a095f	x86util: Extend SPLATW for avx2 Integration to Libav by Josh de Kock <josh@itanimul.li>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>	8 years ago
Ronald S. Bultje	f0a2b6249b	vp9: add 16x16 idct avx2 (8-bit). checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4	8 years ago
Matthieu Bouron	9eb3da2f99	asm: FF_-prefix internal macros used in inline assembly See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.	9 years ago
Matt Oliver	5ca44ebd99	lavu/intmath.h: fix compilation with msvc10. Signed-off-by: Matt Oliver <protogonoi@gmail.com>	9 years ago
James Almer	172af20852	x86/showcqt: use three operand format for some instructions Fixes failures with yasm 1.1.0 and older Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	99b899483e	avutil/x86util: move haddps sse emulation from showcqt Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Diego Biurrun	1e9c5bf4c1	asm: FF_-prefix internal macros used in inline assembly These warnings conflict with system macros on Solaris, producing truckloads of warnings about macro redefinition.	9 years ago
Anton Mitrofanov	2fb1d17a5a	x86inc: Enable AVX emulation in additional cases Allows emulation to work when dst is equal to src2 as long as the instruction is commutative, e.g. `addps m0, m1, m0`. Signed-off-by: Anton Khirnov <anton@khirnov.net>	9 years ago
Anton Mitrofanov	300fb0df84	x86inc: Improve handling of %ifid with multi-token parameters The yasm/nasm preprocessor only checks the first token, which means that parameters such as `dword [rax]` are treated as identifiers, which is generally not what we want. Signed-off-by: Anton Khirnov <anton@khirnov.net>	9 years ago
Anton Mitrofanov	8d02579fae	x86inc: Fix AVX emulation of some instructions Signed-off-by: Anton Khirnov <anton@khirnov.net>	9 years ago
Henrik Gramner	ba3eb745cc	x86inc: Fix AVX emulation of scalar float instructions Those instructions are not commutative since they only change the first element in the vector and leave the rest unmodified. Signed-off-by: Anton Khirnov <anton@khirnov.net>	9 years ago
Vittorio Giovara	41ed7ab45f	cosmetics: Fix spelling mistakes Signed-off-by: Diego Biurrun <diego@biurrun.de>	9 years ago
Anton Mitrofanov	e428f3b30c	x86inc: Enable AVX emulation in additional cases Allows emulation to work when dst is equal to src2 as long as the instruction is commutative, e.g. `addps m0, m1, m0`.	9 years ago
Anton Mitrofanov	4bd5583ace	x86inc: Improve handling of %ifid with multi-token parameters The yasm/nasm preprocessor only checks the first token, which means that parameters such as `dword [rax]` are treated as identifiers, which is generally not what we want.	9 years ago
Anton Mitrofanov	42be240ad6	x86inc: Fix AVX emulation of some instructions	9 years ago
Henrik Gramner	8dd3ee9ddd	x86inc: Fix AVX emulation of scalar float instructions Those instructions are not commutative since they only change the first element in the vector and leave the rest unmodified.	9 years ago
James Almer	70d685a77f	x86: use the new helper macros where useful Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago

1 2 3 4 5 ...

487 Commits (179a2f04eb2bd6df7221883a92dc4e00cf94394b)