FFmpeg

Commit Graph

Author	SHA1	Message	Date
Diego Biurrun	88bd7fdc82	Drop DCTELEM typedef It does not help as an abstraction and adds dsputil dependencies. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	12 years ago
Ronald S. Bultje	ce58642ed0	x86inc: support stack mem allocation and re-alignment in PROLOGUE. Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	12 years ago
Ronald S. Bultje	6f40e9f070	x86inc: support stack mem allocation and re-alignment in PROLOGUE Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	12 years ago
Diego Biurrun	89145fbbfe	x86: h264dsp: Fix linking with yasm and optimizations disabled Some optimized functions reference optimized symbols, so the functions must be explicitly disabled when those symbols are unavailable.	12 years ago
Diego Biurrun	26301caaa1	x86: mmx2 ---> mmxext in asm constructs	12 years ago
Diego Biurrun	d8eda37080	x86: mmx2 ---> mmxext in function names	12 years ago
Michael Niedermayer	6add8eb2ce	x86/h264dsp_init: put a HAVE_YASM back Should fix compilation on open solaris Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	12 years ago
Diego Biurrun	e0c6cce447	x86: Replace checks for CPU extensions and flags by convenience macros This separates code relying on inline from that relying on external assembly and fixes instances where the coalesced check was incorrect.	12 years ago
Diego Biurrun	a84ac7a860	x86: h264dsp: drop some unnecessary ifdefs around prototype declarations	12 years ago
Carl Eugen Hoyos	a26789cf9f	Fix compilation with yasm-0.6.2.	12 years ago
Diego Biurrun	17337f54c0	x86: Split inline and external assembly #ifdefs	12 years ago
Diego Biurrun	29cfdd3767	x86: avcodec: Appropriately name files containing only init functions	12 years ago
Mans Rullgard	c318626ce2	x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h This puts x86-specific things in the x86/ subdirectory where they belong. Signed-off-by: Mans Rullgard <mans@mansr.com>	12 years ago
Diego Biurrun	239fdf1b4a	x86: build: replace mmx2 by mmxext Refactoring mmx2/mmxext YASM code with cpuflags will force renames. So switching to a consistent naming scheme beforehand is sensible. The name "mmxext" is more official and widespread and also the name of the CPU flag, as reported e.g. by the Linux kernel.	12 years ago
Diego Biurrun	81905088a1	x86: h264dsp: K&R formatting cosmetics	12 years ago
Diego Biurrun	6376a3ad24	x86: h264dsp: Remove unused variable ff_pb_3_1	12 years ago
Diego Biurrun	8728b381cb	x86: h264dsp: Adjust YASM #ifdefs This fixes compilation with YASM disabled.	12 years ago
Ronald S. Bultje	b829b4ce29	h264: convert loop filter strength dsp function to yasm. This completes the conversion of h264dsp to yasm; note that h264 also uses some dsputil functions, most notably qpel. Performance-wise, the yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles faster (201->193) on x86-32.	12 years ago
Ronald S. Bultje	a5bbb1242c	h264_loopfilter: port x86 simd to cpuflags.	12 years ago
Diego Biurrun	fe07c9c6b5	x86: Only use optimizations with cmov if the CPU supports the instruction	13 years ago
Michael Niedermayer	915ec91e6b	libavcodec/x86/h264dsp_mmx.c: add forgotten HAVE_YASM Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Reimar Döffinger	b223035511	Detect and check for CMOV. Some MMX-only CPUs do not have support for CMOV. All SSE/MMX2 CPUs should be fine, thus no check was added to those functions. See also https://sourceforge.net/tracker/?func=detail&aid=3358347&group_id=205275&atid=992986 Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	13 years ago
Ronald S. Bultje	c2d337429c	H264: change weight/biweight functions to take a height argument. Neon parts by Mans Rullgard <mans@mansr.com>.	13 years ago
Ronald S. Bultje	229d263cc9	Support for lossless and inter H264 4:2:2.	13 years ago
Baptiste Coudurier	76741b0e56	h264: 4:2:2 intra decoding support Signed-off-by: Diego Biurrun <diego@biurrun.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Baptiste Coudurier	231a6df9ea	h264dec: h264: 4:2:2 intra decoding Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Jason Garrett-Glaser	b5bbc84fe2	H.264: add filter_mb_fast support for >8-bit decoding Much faster high bit depth deblocking.	14 years ago
Daniel Kang	84e70ef004	h264: Add x86 assembly for 10-bit weight/biweight H.264 functions. Mainly ported from 8-bit H.264 weight/biweight. Signed-off-by: Diego Biurrun <diego@biurrun.de>	14 years ago
Carl Eugen Hoyos	5fb67d8039	Fix compilation with old yasm.	14 years ago
Daniel Kang	f3aa65af3a	h264/10bit: add HAVE_ALIGNED_STACK checks. Fixes regression in `836f47d34b` in ICC-10.x, since ICC<=11.0 doesn't align stack upon function calls. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	14 years ago
Daniel Kang	348493db60	Update 8-bit H.264 IDCT function names to reflect bit-depth. Signed-off-by: Ronald S. Bultje <rbultje@google.com>	14 years ago
Daniel Kang	836f47d34b	Add IDCT functions for 10-bit H.264. Ports the majority of IDCT functions for 10-bit H.264. Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author. Signed-off-by: Ronald S. Bultje <rbultje@google.com>	14 years ago
Gil Pedersen	257de5fb25	h264dsp_mmx: Add #ifdefs around some mmxext functions on x86_64. This fixes linking errors due to undefined symbols on x86_64 OS X. Signed-off-by: Diego Biurrun <diego@biurrun.de>	14 years ago
Jason Garrett-Glaser	5705b02079	10-bit H.264 x86 chroma v loopfilter asm Also delete some unused deblock asm macros.	14 years ago
Jason Garrett-Glaser	9f3d6ca4f1	Port x86 10-bit H.264 deblock asm from x264	14 years ago
Jason Garrett-Glaser	8ad77b65b5	Update x86 H.264 deblock asm Includes AVX versions from x264.	14 years ago
Ronald S. Bultje	86b29553f8	h264dsp_mmx: place bracket outside #if/#endif block. Should fix compile on systems missing yasm/nasm.	14 years ago
Oskar Arvidsson	19a0729b4c	Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder. This patch lets e.g. dsputil_init chose dsp functions with respect to the bit depth to decode. The naming scheme of bit depth dependent functions is <base name>_<bit depth>[_<prefix>] (i.e. the old clear_blocks_c is now named clear_blocks_8_c). Note: Some of the functions for high bit depth is not dependent on the bit depth, but only on the pixel size. This leaves some room for optimizing binary size. Preparatory patch for high bit depth h264 decoding support. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	14 years ago
Carl Eugen Hoyos	5c0068758f	Fix compilation with --disable-yasm.	14 years ago
Oskar Arvidsson	8dbe585641	Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder. This patch lets e.g. dsputil_init chose dsp functions with respect to the bit depth to decode. The naming scheme of bit depth dependent functions is <base name>_<bit depth>[_<prefix>] (i.e. the old clear_blocks_c is now named clear_blocks_8_c). Note: Some of the functions for high bit depth is not dependent on the bit depth, but only on the pixel size. This leaves some room for optimizing binary size. Preparatory patch for high bit depth h264 decoding support. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	14 years ago
Mans Rullgard	2912e87a6c	Replace FFmpeg with Libav in licence headers Signed-off-by: Mans Rullgard <mans@mansr.com>	14 years ago
Jason Garrett-Glaser	19fb234e4a	H.264: split luma dc idct out and implement MMX/SSE2 versions About 2.5x the speed. NOTE: the way that the asm code handles large qmuls is a bit suboptimal. If x264-style dequant was used (separate shift and qmul values), it might be possible to get some extra speed. Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Ronald S. Bultje	a52ffc3f54	Move static inline function to a macro, so that constant propagation in inline asm works for gcc-3.x also (hopefully). Should fix gcc-3.x FATE breakage after r25254. Originally committed as revision 25262 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Ronald S. Bultje	cd17285e6c	Merge b_idx and edge variables, and optimize the ASM to directly load variables from memory locations/offsets depending on b_idx plus constants, rather than having gcc do this. This saves several lea calls and together saves about 10 cycles in h264_loop_filter_strength_mmx2(). Originally committed as revision 25256 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Ronald S. Bultje	0cc8a5d088	Remove mv_mask variable. Replace the related pand -1/0 instructions by either a pxor, or remove the instruction alltogether. Altogether, this saves 1 instruction. Originally committed as revision 25255 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Ronald S. Bultje	c0673f2cf4	Remove d_idx as a variable, and instead load it as a constant in the asm. This has no measurable speed effect because the surrounding code doesn't take advantage of this yet. Originally committed as revision 25254 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Ronald S. Bultje	2c3135f6d3	Unroll inner bidir loop in h264_loop_filter_strength_mmx2(), which gets rid of the d_idx variable and therefore allows for future optimizations. No speed difference by this commit itself. Originally committed as revision 25253 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Ronald S. Bultje	4b81511cab	Unloop the outer loop in h264_loop_filter_strength_mmx2(), which allows inlining various constants within the loop code. 20 cycles faster on cathedral sample. Originally committed as revision 25252 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Ronald S. Bultje	7e117771cd	Remove unused variable. Originally committed as revision 25173 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Måns Rullgård	c0bc8b9afb	x86: disable SSE functions using stack when stack is not aligned This fixes crashes with ICC 10.1. Originally committed as revision 25153 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago

20 Commits (42c6f2a645a83c0a0adc933aff4c52861d0b32aa)