FFmpeg

Commit Graph

Author	SHA1	Message	Date
Roland Scheidegger	a812b599b5	h264: assembly version of get_cabac for x86_64 with PIC (v4) This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. There is a surprisingly large performance improvement over the c version (more so than the generated assembly seems to suggest) just in get_cabac, I measured roughly 40% faster for get_cabac on a K8. However, overall the difference is not that big, I measured roughly 5% on a test clip on a K8 and a Core2. Hopefully it still compiles on x86 32bit... v2: incorporated feedback from Loren Merritt to avoid rip-relative movs for every table, and got rid of unnecessary @GOTPCREL. v3: apply similar fixes to the the decode_significance functions, and use same macro arguments for non-pic case. v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect the c code to be faster otherwise since both cmov and sbb suck hard on a Prescott, even can't construct the mask with a 64bit shift as that's just as terrible - it's quite difficult to find usable instructions on that chip...). This is tested to work but not on a P4, in theory it _should_ be fast there. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Ronald S. Bultje	87a246341b	h264: use proper PROLOGUE statement for a function using 8 registers. Fixes crashes when using biweight on win64.	13 years ago
Ronald S. Bultje	b089ca871a	dsputil: fix optimized emu_edge function on Win64. Recent register allocation changes (x86inc.asm update) changed the register order and thus opcodes for the inner loops. One of them became >128bytes, which confuses other parts of this function where it jumps to fixed-offset positions to extend the edge by fixed amounts. A simple register change fixes this.	13 years ago
Justin Ruggles	de7f22ab0c	ac3dsp: call femms/emms at the end of float_to_fixed24() for 3DNow and SSE Fixes ac3-encode and eac3-encode FATE test failures with SSE2 disabled. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Ronald S. Bultje	76538d7a78	h264: fix 10bit biweight functions after recent x86inc.asm fixes. This should have been updated in the x86inc.asm update, but was accidently forgotten.	13 years ago
Diego Biurrun	7bb3a302fe	build: Consistently handle conditional compilation for all optimization OBJS.	13 years ago
Henrik Gramner	729f90e268	x86inc improvements for 64-bit Add support for all x86-64 registers Prefer caller-saved register over callee-saved on WIN64 Support up to 15 function arguments Also (by Ronald S. Bultje) Fix up our asm to work with new x86inc.asm. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>	13 years ago
Christophe GISQUET	2130bd8f5b	rv40dsp x86: use only one register, for both increment and loop counter Around 10 cycles faster for luma. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Christophe GISQUET	272b252c01	rv40dsp: implement prescaled versions for biweight. Quite often, the original weights are multiple of 512. By prescaling them by 1/512 when they are computed (once per frame), no intermediate shifting is needed, and no prescaling on each call either. The x86 code already used that trick. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Christophe GISQUET	6b81da2fd0	dsputil x86: use SSE float instruction instead of SSE2 integer equivalent All the more required since the users are pure SSE functions. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Christophe GISQUET	cd88105f6f	dsputil x86: remove deprecated parameter from scalarproduct_int16 prototype Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Christophe GISQUET	f9888520cc	vp8dsp x86: perform rounding shift with a single instruction Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Michael Niedermayer	226671ee2f	dsputil_mmx: fix scalarproduct prototypes Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Ronald S. Bultje	a940198130	cabac: add overread protection to BRANCHLESS_GET_CABAC(). Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind	13 years ago
Ronald S. Bultje	448dc42571	cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC().	13 years ago
Ronald S. Bultje	16f6e83f74	cabac: remove unused argument from BRANCHLESS_GET_CABAC_UPDATE().	13 years ago
Ronald S. Bultje	951014e5bb	cabac: use struct+offset instead of memory operand in BRANCHLESS_GET_CABAC().	13 years ago
Ronald S. Bultje	a0bdcb019e	h264: add overread protection to get_cabac_bypass_sign_x86().	13 years ago
Ronald S. Bultje	95bfa4ead7	h264: reindent get_cabac_bypass_sign_x86().	13 years ago
Ronald S. Bultje	db025929f2	h264: use struct offsets in get_cabac_bypass_sign_x86().	13 years ago
Diego Biurrun	ad0e31f134	build: prettyprinting cosmetics	13 years ago
Diego Biurrun	62ce9defb8	x86: dsputil: prettyprint gcc inline asm	13 years ago
Diego Biurrun	3b54912113	x86: K&R prettyprinting cosmetics for dsputil_mmx.c	13 years ago
Diego Biurrun	915a2a0a65	x86: conditionally compile H.264 QPEL optimizations	13 years ago
Diego Biurrun	3816642eab	dsputil_mmx: Surround QPEL macros by "do { } while (0);" blocks. This makes them safe to use in non-fully braced if-blocks and similar.	13 years ago
Carl Eugen Hoyos	5cddfc58d8	Fix linking without yasm.	13 years ago
Ronald S. Bultje	71ea26811c	aacsbr: handle m_max values smaller than 4. Prevents a signflip in the counter, and a subsequent crash because of overreads/overwrites. Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC: libav-stable@libav.org	13 years ago
Reimar Döffinger	adb98a3d22	VC1: restore optimizations broken in `9a1ced32`. They were moved into code under HAVE_YASM and most of them even into completely disabled code with no reason given for that in the commit message. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	13 years ago
ami_stuff	f6b7863808	Replace SSE2 instruction in scalarproduct_float_sse() by SSE equivalent. Fixes an AAC decoding issue with the sample from ticket #213 on machines with SSE but without SSE2. Based on 89411a by Reimar.	13 years ago
Reimar Döffinger	89411ae699	Replace SSE2 instruction by SSE equivalent. This is even potentially faster in this use-case. Should fix AAC SBR decoding on machines with SSE but not SSE2, fixing track issue #1041. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	13 years ago
Michael Niedermayer	219a6fb61c	dsp: fix diff_bytes_mmx() with small width Fixes Ticket1068 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Michael Niedermayer	dd2631a6df	dsputil: mark source of diff_bytes as const. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Michael Niedermayer	1bc85fb32d	dirac: mark some variables const. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Nico Weber	599888a480	Move struc FFTContext below SECTION_RODATA Yasm creates an implicit unaligned text section if "struc" is used outside of any section: http://tortall.lighthouseapp.com/projects/78676-yasm/tickets/247 Since yasm only honors the "align" annotation on the first declaration of a section, this implicit text section causes all text section alignments to be ignored. Also fixes a yasm warning about it agnoring alignment. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Ronald S. Bultje	a928ed3751	vp8: convert mbedge loopfilter x86 assembly to use named arguments.	13 years ago
Ronald S. Bultje	bee330e300	vp8: convert inner loopfilter x86 assembly to use named arguments.	13 years ago
Reimar Döffinger	6eda85e15b	sbrdsp.asm: convert all instructions to float/SSE ones. Since the values are floats, using the float operations makes sense, improves performance on some CPUs and makes the code SSE compatible instead of needing SSE2. Based on suggestion by Jason. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Christophe GISQUET	7e1ce6a6ac	dsputil: remove shift parameter from scalarproduct_int16 There is only one caller, which does not need the shifting. Other use cases are situations where different roundings would be needed. The x86 and neon versions are modified accordingly. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Diego Biurrun	1e9d55e45e	x86: Remove duplicated AVG_3DNOW_OP / AVG_MMX2_OP macros from h264_qpel_mmx.c.	13 years ago
Reimar Döffinger	b5161908e0	SBR DSP: fix SSE code to not use SSE2 instructions. movq from SSE register _to_ memory is an SSE2 instruction. Use the SSE movlps function instead that does the same thing. Signed-off-by: Reimar DÃ¶ffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Mans Rullgard	356ee8d7de	x86: clean up ff_dsputil_init_mmx() This splits ff_dsputil_init_mmx() into multiple functions, one for each MMX/SSE level, somewhat simplifying the nested conditions. Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>	13 years ago
Ronald S. Bultje	b4188f0d46	vp8: convert simple loopfilter x86 assembly to use named arguments.	13 years ago
Ronald S. Bultje	8476ca3b4e	vp8: convert idct x86 assembly to use named arguments.	13 years ago
Ronald S. Bultje	21ffc78fd7	vp8: convert mc x86 assembly to use named arguments.	13 years ago
Ronald S. Bultje	28170f1a39	vp8: convert loopfilter x86 assembly to use cpuflags().	13 years ago
Ronald S. Bultje	e25be47154	vp8: convert idct/mc x86 assembly to use cpuflags().	13 years ago
Ronald S. Bultje	291c9b6285	h264: change underread for 10bit QPEL to overread. This prevents us from reading before the start of the buffer, and thus prevents crashes resulting from this behaviour. Fixes bug 237.	13 years ago
Ronald S. Bultje	45549339bc	vp8: disable mmx functions with sse/sse2 counterparts on x86-64. x86-64 is guaranteed to have at least SSE2, therefore the MMX/MMX2 functions will never be used in practice.	13 years ago
Ronald S. Bultje	bd66f073fe	vp8: change int stride to ptrdiff_t stride. On 64bit platforms with 32bit int, this means we won't have to sign- extend the integer anymore.	13 years ago
Ronald S. Bultje	b0c4f04338	h264: fix mmxext chroma deblock to use correct TC values.	13 years ago

... 2 3 4 5 6 ...

905 Commits (f398617b197d9a44b27d3313df8cd6c72b6a168a)