FFmpeg

Commit Graph

Author	SHA1	Message	Date
Roland Scheidegger	82c71913e4	h264: new assembly version of get_cabac for x86_64 with PIC This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. There is a surprisingly large performance improvement over the c version (more so than the generated assembly seems to suggest) just in get_cabac, I measured roughly 40% faster for get_cabac on a K8. However, overall the difference is not that big, I measured roughly 5% on a test clip on a K8 and a Core2. Hopefully it still compiles on x86 32bit... Now that only one table is used, there's some chance even darwin as compiles this (apparently the label arithmetic used previously doesn't work if it involves symbols defined in a different file, thanks to Ronald S. Bultje for helping me with this). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Roland Scheidegger	7f668cd2b5	h264: use one table instead of several for cabac functions The reason is this is easier for PIC code (in particular on darwin...). Keep the old names as pointers (static in cabac_functions.h so gcc knows these are just immediate offsets) so the c code can nicely stay the same (alternatively could use offsets directly in the functions needing the tables). This should produce the same code as before with non-pic and better code (confirmed) with pic. The assembly uses the new table but still won't work for PIC case. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Roland Scheidegger	9b9df1cdff	h264: new assembly version of get_cabac for x86_64 with PIC This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. get_cabac() gets about 40% faster, for an overall speedup of about 5%. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Roland Scheidegger	14e9ffc1e4	h264: use one table instead of several for cabac functions The reason is this is easier for PIC code (in particular on darwin...). Keep the old names as pointers (static in cabac_functions.h so gcc knows these are just immediate offsets) so the c code can nicely stay the same (alternatively could use offsets directly in the functions needing the tables). This should produce the same code as before with non-pic and better code (confirmed) with pic. The assembly uses the new table but still won't work for PIC case. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Michael Niedermayer	9849515214	Revert "h264: assembly version of get_cabac for x86_64 with PIC (v4)" This broke compilation on darwin, revert until a better solution is found. This reverts commit `a812b599b5`.	13 years ago
Roland Scheidegger	a812b599b5	h264: assembly version of get_cabac for x86_64 with PIC (v4) This adds a hand-optimized assembly version for get_cabac much like the existing one, but it works if the table offsets are RIP-relative. Compared to the non-RIP-relative version this adds 2 lea instructions and it needs one extra register. There is a surprisingly large performance improvement over the c version (more so than the generated assembly seems to suggest) just in get_cabac, I measured roughly 40% faster for get_cabac on a K8. However, overall the difference is not that big, I measured roughly 5% on a test clip on a K8 and a Core2. Hopefully it still compiles on x86 32bit... v2: incorporated feedback from Loren Merritt to avoid rip-relative movs for every table, and got rid of unnecessary @GOTPCREL. v3: apply similar fixes to the the decode_significance functions, and use same macro arguments for non-pic case. v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect the c code to be faster otherwise since both cmov and sbb suck hard on a Prescott, even can't construct the mask with a 64bit shift as that's just as terrible - it's quite difficult to find usable instructions on that chip...). This is tested to work but not on a P4, in theory it _should_ be fast there. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Diego Biurrun	0becb07842	h264: Factorize declaration of mb_sizes array.	13 years ago
Ronald S. Bultje	63a1b481f6	h264: fix cabac-on-stack after safe cabac reader.	13 years ago
Ronald S. Bultje	d1604b3de9	h264: prevent overreads in intra PCM decoding. Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC: libav-stable@libav.org	13 years ago
Ronald S. Bultje	45b7bd7c53	h264: disallow constrained intra prediction modes for luma. Conversion of the luma intra prediction mode to one of the constrained ("alzheimer") ones can happen by crafting special bitstreams, causing a crash because we'll call a NULL function pointer for 16x16 block intra prediction, since constrained intra prediction functions are only implemented for chroma (8x8 blocks). Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC: libav-stable@libav.org	13 years ago
Diego Biurrun	55b9ef18e4	cabac: split cabac.h into declarations and function definitions This fixes standalone compilation of some decoders with --disable-optimizations. cabac.h defines some inline functions that use symbols from cabac.c. Without optimizations these inline functions are not eliminated and linking fails with references to non-existing symbols. Splitting the inline functions off into their own header and only #including it in the places where the inline functions are used allows #including cabac.h from anywhere without ill effects.	13 years ago
Martin Storsjö	676a9ee1d2	x86: Fix constraints for decode_significance*_x86 Originally, prior to `8742a4ff8`, the caller code was compiled within this condition: ARCH_X86 && HAVE_7REGS && HAVE_EBX_AVAILABLE && !defined(BROKEN_RELOCATIONS) Since HAVE_7REGS is defined as (ARCH_X86_64 \|\| (HAVE_EBX_AVAILABLE && HAVE_EBP_AVAILABLE)) the subcondition HAVE_7REGS && HAVE_EBX_AVAILABLE is equal to HAVE_7REGS (for 32 bit at least). The correct simplification of the original condition thus is HAVE_7REGS, not HAVE_EBX_AVAILABLE. This fixes compilation in some cases where HAVE_EBP_AVAILABLE = 0 and HAVE_EBX_AVAILABLE = 1. Signed-off-by: Martin Storsjö <martin@martin.st>	13 years ago
Diego Biurrun	6fdb2ce34a	x86: Tighten register constraints for decode_significance*_x86. On 32-bit OS X with gcc 4.0/4.2 and shared libraries enabled, the ebx register is not available, but required to assemble the functions. This reverts commit `8742a4f` to a simplified version of the original constraints.	13 years ago
Diego Biurrun	8742a4ff87	h264_cabac: synchronize decode_significance_*_x86 conditionals The definition and the call site where under different #ifdefs.	13 years ago
Michael Niedermayer	38331d2036	h264: disable checking reader, overreads are not possible in ffmpegs h264 decoder. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Luca Barbato	5bf2ac2b37	error_resilience: use the ER_ namespace Add the namespace to {AC_,DC_,MV_}{END,ERROR} macros Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	13 years ago
Diego Biurrun	58c42af722	doxygen: misc consistency, spelling and wording fixes	13 years ago
Baptiste Coudurier	76741b0e56	h264: 4:2:2 intra decoding support Signed-off-by: Diego Biurrun <diego@biurrun.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Laurent Aimar	a4fd95b5d5	h264: fix intra 16x16 mode check when using mbaff and constrained_intra_pred. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Baptiste Coudurier	231a6df9ea	h264dec: h264: 4:2:2 intra decoding Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	14 years ago
Jason Garrett-Glaser	6c32576548	H.264: optimize CABAC x86 asm for Atom	14 years ago
Diego Biurrun	657ccb5ac7	Eliminate FF_COMMON_FRAME macro. FF_COMMON_FRAME holds the contents of the AVFrame structure and is also copied to struct Picture. Replace by an embedded AVFrame structure in struct Picture.	14 years ago
Jason Garrett-Glaser	99b6d2c065	H.264: use fill_rectangle in CABAC decoding	14 years ago
Jason Garrett-Glaser	556f8a066c	H.264: template left MB handling Faster H.264 decoding with ALLOW_INTERLACE off.	14 years ago
Jason Garrett-Glaser	3b7ebeb4d5	H.264: faster write_back_* Avoid aliasing, unroll loops, and inline more functions.	14 years ago
Carl Eugen Hoyos	4d08dfefa9	Remove gcc 2.95.3 remnants.	14 years ago
Carl Eugen Hoyos	81ef892ca8	Use HAVE_TEN_OPERANDS for new decode_significance* functions.	14 years ago
Jason Garrett-Glaser	c90b94424c	4:4:4 H.264 decoding support Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.	14 years ago
Jason Garrett-Glaser	504811baea	Roll back 4:4:4 H.264 for now Needs some ARM/PPC asm modifications.	14 years ago
Jason Garrett-Glaser	c9c493872c	4:4:4 H.264 decoding support Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.	14 years ago
Oskar Arvidsson	fcc0224e4f	Add support for higher QP values in h264. In high bit depth, the QP values may now be up to (51 + 6*(bit_depth-8)). Preparatory patch for high bit depth h264 decoding support. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	14 years ago
Oskar Arvidsson	6e3ef511d7	Add the notion of pixel size in h264 related functions. In high bit depth the pixels will not be stored in uint8_t like in the normal case, but in uint16_t. The pixel size is thus 1 in normal bit depth and 2 in high bit depth. Preparatory patch for high bit depth h264 decoding support. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	14 years ago
Stefano Sabatini	ce5e49b0c2	replace deprecated FF__TYPE symbols with AV_PICTURE_TYPE_	14 years ago
Stefano Sabatini	975a1447f7	Replace deprecated FF__TYPE symbols with AV_PICTURE_TYPE_. Signed-off-by: Diego Biurrun <diego@biurrun.de>	14 years ago
Michael Niedermayer	179106ed78	H264: factor if() out of coef decoding loop of decode_cabac_residual_internal() Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	14 years ago
Michael Niedermayer	e7077f5e7b	H264: replace pixel_size by pixel_shift Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	14 years ago
Oskar Arvidsson	d268bed209	Add support for higher QP values in h264. In high bit depth, the QP values may now be up to (51 + 6*(bit_depth-8)). Preparatory patch for high bit depth h264 decoding support. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	14 years ago
Oskar Arvidsson	dc172ecc6e	Add the notion of pixel size in h264 related functions. In high bit depth the pixels will not be stored in uint8_t like in the normal case, but in uint16_t. The pixel size is thus 1 in normal bit depth and 2 in high bit depth. Preparatory patch for high bit depth h264 decoding support. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	14 years ago
Mans Rullgard	2912e87a6c	Replace FFmpeg with Libav in licence headers Signed-off-by: Mans Rullgard <mans@mansr.com>	14 years ago
Ronald S. Bultje	7f8c11b005	Set gray (128) U/V planes for chroma-less samples. Fixes two fate samples when played with -flags emu_edge. (cherry picked from commit `8bcfe7f7fd`)	14 years ago
Ronald S. Bultje	772225c041	Revert `2a1f431d38`, it broke H264 lossless. (cherry picked from commit `66c6b5e2a5`)	14 years ago
Ronald S. Bultje	66c6b5e2a5	Revert `2a1f431d38`, it broke H264 lossless.	14 years ago
Ronald S. Bultje	8bcfe7f7fd	Set gray (128) U/V planes for chroma-less samples. Fixes two fate samples when played with -flags emu_edge.	14 years ago
Jason Garrett-Glaser	b9af15402d	Remove evil timers that snuck their way into r26375. Originally committed as revision 26377 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Jason Garrett-Glaser	fb2734c8a6	Fix r26375 on non-x86. Originally committed as revision 26376 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Jason Garrett-Glaser	f14bdd8e75	H.264: Partially inline CABAC residual decoding Improves CABAC performance about ~1.2%. Trick originates from x264 and has also been used in ffvp8. It's useful because coded block flags are usually zero, so it helps to have the early termination inlined into the main function. Originally committed as revision 26375 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Jason Garrett-Glaser	2a1f431d38	H.264/SVQ3: make chroma DC work the same way as luma DC No speed improvement, but necessary for some future stuff. Also opens up the possibility of asm chroma dc idct/dequant. Originally committed as revision 26349 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Jason Garrett-Glaser	5657d14094	H.264: switch to x264-style tracking of luma/chroma DC NNZ Useful so that we don't have to run the hierarchical DC iDCT if there aren't any coefficients. Opens up some future opportunities for optimization as well. Originally committed as revision 26337 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Jason Garrett-Glaser	19fb234e4a	H.264: split luma dc idct out and implement MMX/SSE2 versions About 2.5x the speed. NOTE: the way that the asm code handles large qmuls is a bit suboptimal. If x264-style dequant was used (separate shift and qmul values), it might be possible to get some extra speed. Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Diego Biurrun	ba87f0801d	Remove explicit filename from Doxygen @file commands. Passing an explicit filename to this command is only necessary if the documentation in the @file block refers to a file different from the one the block resides in. Originally committed as revision 22921 to svn://svn.ffmpeg.org/ffmpeg/trunk	15 years ago

1 2

100 Commits (6cfaccabc4edc3321c9a47e349236815b9d649e2)