FFmpeg

Commit Graph

Author	SHA1	Message	Date
Diego Biurrun	62ce9defb8	x86: dsputil: prettyprint gcc inline asm	13 years ago
Diego Biurrun	3b54912113	x86: K&R prettyprinting cosmetics for dsputil_mmx.c	13 years ago
Diego Biurrun	915a2a0a65	x86: conditionally compile H.264 QPEL optimizations	13 years ago
Diego Biurrun	3816642eab	dsputil_mmx: Surround QPEL macros by "do { } while (0);" blocks. This makes them safe to use in non-fully braced if-blocks and similar.	13 years ago
Carl Eugen Hoyos	5cddfc58d8	Fix linking without yasm.	13 years ago
Ronald S. Bultje	71ea26811c	aacsbr: handle m_max values smaller than 4. Prevents a signflip in the counter, and a subsequent crash because of overreads/overwrites. Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC: libav-stable@libav.org	13 years ago
Reimar Döffinger	adb98a3d22	VC1: restore optimizations broken in `9a1ced32`. They were moved into code under HAVE_YASM and most of them even into completely disabled code with no reason given for that in the commit message. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	13 years ago
ami_stuff	f6b7863808	Replace SSE2 instruction in scalarproduct_float_sse() by SSE equivalent. Fixes an AAC decoding issue with the sample from ticket #213 on machines with SSE but without SSE2. Based on 89411a by Reimar.	13 years ago
Reimar Döffinger	89411ae699	Replace SSE2 instruction by SSE equivalent. This is even potentially faster in this use-case. Should fix AAC SBR decoding on machines with SSE but not SSE2, fixing track issue #1041. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	13 years ago
Michael Niedermayer	219a6fb61c	dsp: fix diff_bytes_mmx() with small width Fixes Ticket1068 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Michael Niedermayer	dd2631a6df	dsputil: mark source of diff_bytes as const. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Michael Niedermayer	1bc85fb32d	dirac: mark some variables const. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Nico Weber	599888a480	Move struc FFTContext below SECTION_RODATA Yasm creates an implicit unaligned text section if "struc" is used outside of any section: http://tortall.lighthouseapp.com/projects/78676-yasm/tickets/247 Since yasm only honors the "align" annotation on the first declaration of a section, this implicit text section causes all text section alignments to be ignored. Also fixes a yasm warning about it agnoring alignment. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Ronald S. Bultje	a928ed3751	vp8: convert mbedge loopfilter x86 assembly to use named arguments.	13 years ago
Ronald S. Bultje	bee330e300	vp8: convert inner loopfilter x86 assembly to use named arguments.	13 years ago
Reimar Döffinger	6eda85e15b	sbrdsp.asm: convert all instructions to float/SSE ones. Since the values are floats, using the float operations makes sense, improves performance on some CPUs and makes the code SSE compatible instead of needing SSE2. Based on suggestion by Jason. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Christophe GISQUET	7e1ce6a6ac	dsputil: remove shift parameter from scalarproduct_int16 There is only one caller, which does not need the shifting. Other use cases are situations where different roundings would be needed. The x86 and neon versions are modified accordingly. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Diego Biurrun	1e9d55e45e	x86: Remove duplicated AVG_3DNOW_OP / AVG_MMX2_OP macros from h264_qpel_mmx.c.	13 years ago
Reimar Döffinger	b5161908e0	SBR DSP: fix SSE code to not use SSE2 instructions. movq from SSE register _to_ memory is an SSE2 instruction. Use the SSE movlps function instead that does the same thing. Signed-off-by: Reimar DÃ¶ffinger <Reimar.Doeffinger@gmx.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Mans Rullgard	356ee8d7de	x86: clean up ff_dsputil_init_mmx() This splits ff_dsputil_init_mmx() into multiple functions, one for each MMX/SSE level, somewhat simplifying the nested conditions. Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: Diego Biurrun <diego@biurrun.de>	13 years ago
Ronald S. Bultje	b4188f0d46	vp8: convert simple loopfilter x86 assembly to use named arguments.	13 years ago
Ronald S. Bultje	8476ca3b4e	vp8: convert idct x86 assembly to use named arguments.	13 years ago
Ronald S. Bultje	21ffc78fd7	vp8: convert mc x86 assembly to use named arguments.	13 years ago
Ronald S. Bultje	28170f1a39	vp8: convert loopfilter x86 assembly to use cpuflags().	13 years ago
Ronald S. Bultje	e25be47154	vp8: convert idct/mc x86 assembly to use cpuflags().	13 years ago
Ronald S. Bultje	291c9b6285	h264: change underread for 10bit QPEL to overread. This prevents us from reading before the start of the buffer, and thus prevents crashes resulting from this behaviour. Fixes bug 237.	13 years ago
Ronald S. Bultje	45549339bc	vp8: disable mmx functions with sse/sse2 counterparts on x86-64. x86-64 is guaranteed to have at least SSE2, therefore the MMX/MMX2 functions will never be used in practice.	13 years ago
Ronald S. Bultje	bd66f073fe	vp8: change int stride to ptrdiff_t stride. On 64bit platforms with 32bit int, this means we won't have to sign- extend the integer anymore.	13 years ago
Ronald S. Bultje	b0c4f04338	h264: fix mmxext chroma deblock to use correct TC values.	13 years ago
Christophe GISQUET	2784d18791	SBR DSP x86: implement SSE sbr_hf_g_filt Unrolling the main loop to process, instead of 4 elements: - 8: minor gain of 2 cycles (not worth the extra object size) - 2: loss of 8 cycles. Assigning STEP to a register is a loss. Output address (Y) is almost always unaligned. Timings: - C (32/64 bits): 117/109 cycles - SSE: 57 cycles Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Christophe GISQUET	34454c761f	SBR DSP x86: implement SSE sbr_sum_square_sse The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Ronald S. Bultje	3ab9a2a557	rv34: change most "int stride" into "ptrdiff_t stride". This prevents having to sign-extend on 64-bit systems with 32-bit ints, such as x86-64. Also fixes crashes on systems where we don't do it and arguments are not in registers, such as Win64 for all weight functions.	13 years ago
Ronald S. Bultje	8fb26950ed	h264: don't use redzone in loopfilter on win64. Red zone usage is not allowed in the Win64 ABI.	13 years ago
Michael Niedermayer	f9caec0cf9	h264: change deblock_h_chroma_8_mmxext() to prevent valgrind confusion. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Christophe GISQUET	f3e084909b	mpegaudio: replace memcpy by SIMD code By replacing memcpy with an unrolled loop using the alignment knowledge it has, some speedup can be obtained. Before (gcc 4.6.1): ~400 cycles After: ~370 cycles Overall, around 2% speed increase when decoding a 2400s mp3 to f32le. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Martin Storsjö	efd29844eb	mpegvideo: Add ff_ prefix to nonstatic functions Signed-off-by: Martin Storsjö <martin@martin.st>	13 years ago
Martin Storsjö	873c89e2a6	dsputil: Add ff_ prefix to inv_zigzag_direct16 Signed-off-by: Martin Storsjö <martin@martin.st>	13 years ago
Martin Storsjö	9cf0841ef3	dsputil: Add ff_ prefix to the dsputil_init functions Signed-off-by: Martin Storsjö <martin@martin.st>	13 years ago
Reimar Döffinger	f51a072160	Fix compilation without HAVE_AVX. %ifdef HAVE_AVX must now be %if HAVE_AVX. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	13 years ago
Reimar Döffinger	b223035511	Detect and check for CMOV. Some MMX-only CPUs do not have support for CMOV. All SSE/MMX2 CPUs should be fine, thus no check was added to those functions. See also https://sourceforge.net/tracker/?func=detail&aid=3358347&group_id=205275&atid=992986 Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	13 years ago
Reimar Döffinger	394d41ee30	Partially revert "Fix png decoding on x86." This partially reverts commit `58dabf7bf2`. It is no longer necessary to use unaligned mov. The swapped mov argument fix remains though.	13 years ago
Justin Ruggles	d483bb58c3	ac3dsp: do not use pshufb in ac3_extract_exponents_ssse3() We need to do unsigned saturation in order to cover the corner case when the absolute coefficient value is 16777215 (the maximum value). Fixes Bug #216	13 years ago
Diego Biurrun	0bba26466f	cosmetics: Delete empty lines at end of file.	13 years ago
Ronald S. Bultje	ce1e250ee9	h264: manually save/restore XMM registers for functions using INIT_MMX. On Win64, these registers are callee-save, so not saving/restoring them correctly is a violation of ABI and can lead to crashes or corrupt data.	13 years ago
Ronald S. Bultje	4ff6dea390	pngdsp: swap argument inversion.	13 years ago
Michael Kostylev	3206cccc0e	h264: mark h264_idct_add8_10 with number of XMM registers. This fixes XMM register clobber problems on Win64. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Reimar Döffinger	58dabf7bf2	Fix png decoding on x86. Line sizes are only 8-byte aligned, so use unaliged loads for add_bytes_l2 pointers. Increasing the alignment requirement to 16 seemed a bit extreme (png may be used for rather small sizes). Also fix a mov that had its arguments swapped, leading add_bytes_l2 being applied on up to 8 bytes too few. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	13 years ago
Reimar Döffinger	da1ba4e88b	Fix NASM compilation. movd needs explicit register size prefix for NASM. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	13 years ago
KO Myung-Hun	c853124fb0	Use SECTION_TEXT instead of section .text for the compatibility aout does not support 'align='. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Ronald S. Bultje	7e4d9d5d45	win64: add a XMM clobber test configure option. This will be useful to test more aggressively for failures to mark XMM registers as clobbered in Win64 builds, and prevent regressions thereof. Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>	13 years ago

1 2 3 4 5 ...

669 Commits (33f39c02aa0d6d2479a95669fe36cd45fe7f3bb8)