FFmpeg

Commit Graph

Author	SHA1	Message	Date
Rémi Denis-Courmont	4ad5b9c8db	lavc/startcode: add R-V Zbb startcode_find_candidate The main loop processes 8 bytes in 5 instructions. For comparison, the optimal plain strnlen() requires 4 instructions per byte (6.4x worse): LBU; ADDI; BEQZ; BNE. The current libavcodec C code involves 5 instructions per byte (8x worse). Actual benchmarks may be slightly less favourable due to latency from ORC.B to BNE.	6 months ago
gxw	3f294ec879	avcodec: [loongarch] Optimize h264dsp with LASX. ./ffmpeg -i ../1_h264_1080p_30fps_3Mbps.mp4 -f rawvideo -y /dev/null -an before:225 after :282 Change-Id: Ibe245827dcdfe8fc1541c6b172483151bfa9e642 Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Reviewed-by: guxiwei <guxiwei-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	3 years ago
Andreas Rheinhardt	afc95a10ac	avcodec/h264dsp, h264idct: Fix lengths of array parameters Fixes many -Warray-parameter warnings from GCC 11. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	3 years ago
James Almer	d5d699ab6e	avcodec/h264dsp: change loop filter stride argument to ptrdiff_t	6 years ago
Michael Niedermayer	bc26fe8927	avcodec/h264: Use ptrdiff_t for (bi)weight functions Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Vittorio Giovara	41ed7ab45f	cosmetics: Fix spelling mistakes Signed-off-by: Diego Biurrun <diego@biurrun.de>	9 years ago
Shivraj Patil	02001ada5c	avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for H264 lpf and weight/biweight functions Reviewed-by: Nedeljko Babic <Nedeljko.Babic@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	10 years ago
Ben Avison	db7f1c7c5a	h264: Move start code search functions into separate source files. This permits re-use with parsers for codecs which use similar start codes. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	10 years ago
Janne Grunau	8438b3f09f	aarch64: h264 idct NEON assembler optimizations Ported from ARMv7 NEON.	11 years ago
Ben Avison	218d6844b3	h264dsp: Factorize code into a new function, h264_find_start_code_candidate This performs the start code search which was previously part of h264_find_frame_end() - the most CPU intensive part of the function. By itself, this results in a performance regression: Before After Mean StdDev Mean StdDev Change Overall time 2925.6 26.2 3068.5 31.7 -4.7% but this can more than be made up for by platform-optimised implementations of the function. Signed-off-by: Martin Storsjö <martin@martin.st>	11 years ago
Ronald S. Bultje	62844c3fd6	h264: Integrate clear_blocks calls with IDCT The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700 to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb (in the decode_slice loop) goes from 1759 to 1733 cycles on the clip tested (cathedral), i.e. almost 30 cycles per mb faster. Signed-off-by: Martin Storsjö <martin@martin.st>	12 years ago
Ronald S. Bultje	2ed008204d	h264: Add add_pixels4/8() to h264dsp, and remove add_pixels4 from dsputil These functions are mostly H264-specific (the only other user I can spot is bink), and this allows us to special-case some functionality for H264. Also remove the 16-bit-coeff with >8bpp versions (unused) and merge the duplicate 32-bit-coeff for >8bpp (identical). Signed-off-by: Martin Storsjö <martin@martin.st>	12 years ago
Ronald S. Bultje	1acd7d594c	h264: integrate clear_blocks calls with IDCT. The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700 to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb (in the decode_slice loop) goes from 1759 to 1733 cycles on the clip tested (cathedral), i.e. almost 30 cycles per mb faster. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	12 years ago
Ronald S. Bultje	7ff1a4b10f	Add add_pixels4/8() to h264dsp, and remove add_pixels4 from dsputil. These functions are mostly H264-specific (the only other user I can spot is bink), and this allows us to special-case some functionality for H264. Also remove the 16-bit-coeff with >8bpp versions (unused) and merge the duplicate 32-bit-coeff for >8bpp (identical). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	12 years ago
Diego Biurrun	88bd7fdc82	Drop DCTELEM typedef It does not help as an abstraction and adds dsputil dependencies. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	12 years ago
Diego Biurrun	1de53d006b	h264: K&R formatting cosmetics for header files (part II/II)	13 years ago
Diego Biurrun	10d2ea2604	h264: Remove a commented-out function pointer typedef.	13 years ago
Ronald S. Bultje	c2d337429c	H264: change weight/biweight functions to take a height argument. Neon parts by Mans Rullgard <mans@mansr.com>.	13 years ago
Baptiste Coudurier	76741b0e56	h264: 4:2:2 intra decoding support Signed-off-by: Diego Biurrun <diego@biurrun.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	13 years ago
Baptiste Coudurier	231a6df9ea	h264dec: h264: 4:2:2 intra decoding Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	13 years ago
Jason Garrett-Glaser	c90b94424c	4:4:4 H.264 decoding support Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.	14 years ago
Jason Garrett-Glaser	504811baea	Roll back 4:4:4 H.264 for now Needs some ARM/PPC asm modifications.	14 years ago
Jason Garrett-Glaser	c9c493872c	4:4:4 H.264 decoding support Note: this is 4:4:4 from the 2007 spec revision, not the previous (now deprecated) 4:4:4 mode in H.264.	14 years ago
Daniel Kang	348493db60	Update 8-bit H.264 IDCT function names to reflect bit-depth. Signed-off-by: Ronald S. Bultje <rbultje@google.com>	14 years ago
Oskar Arvidsson	19a0729b4c	Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder. This patch lets e.g. dsputil_init chose dsp functions with respect to the bit depth to decode. The naming scheme of bit depth dependent functions is <base name>_<bit depth>[_<prefix>] (i.e. the old clear_blocks_c is now named clear_blocks_8_c). Note: Some of the functions for high bit depth is not dependent on the bit depth, but only on the pixel size. This leaves some room for optimizing binary size. Preparatory patch for high bit depth h264 decoding support. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	14 years ago
Oskar Arvidsson	e39e3abad4	Choose h264 chroma dc dequant function dynamically. Needed for high bit depth h264 decoding. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	14 years ago
Ronald S. Bultje	dd561441b1	h264: DSP'ize MBAFF loopfilter.	14 years ago
Oskar Arvidsson	8dbe585641	Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder. This patch lets e.g. dsputil_init chose dsp functions with respect to the bit depth to decode. The naming scheme of bit depth dependent functions is <base name>_<bit depth>[_<prefix>] (i.e. the old clear_blocks_c is now named clear_blocks_8_c). Note: Some of the functions for high bit depth is not dependent on the bit depth, but only on the pixel size. This leaves some room for optimizing binary size. Preparatory patch for high bit depth h264 decoding support. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	14 years ago
Oskar Arvidsson	af0b2d6736	Choose h264 chroma dc dequant function dynamically. Needed for high bit depth h264 decoding. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	14 years ago
Mans Rullgard	2912e87a6c	Replace FFmpeg with Libav in licence headers Signed-off-by: Mans Rullgard <mans@mansr.com>	14 years ago
Ronald S. Bultje	772225c041	Revert `2a1f431d38`, it broke H264 lossless. (cherry picked from commit `66c6b5e2a5`)	14 years ago
Ronald S. Bultje	66c6b5e2a5	Revert `2a1f431d38`, it broke H264 lossless.	14 years ago
Jason Garrett-Glaser	2a1f431d38	H.264/SVQ3: make chroma DC work the same way as luma DC No speed improvement, but necessary for some future stuff. Also opens up the possibility of asm chroma dc idct/dequant. Originally committed as revision 26349 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Jason Garrett-Glaser	bd11c7a1a8	Remove outdated comment in h264dsp.h Since we no longer have non-transposed scantables, the problem it warns about no longer exists. Originally committed as revision 26339 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Jason Garrett-Glaser	19fb234e4a	H.264: split luma dc idct out and implement MMX/SSE2 versions About 2.5x the speed. NOTE: the way that the asm code handles large qmuls is a bit suboptimal. If x264-style dequant was used (separate shift and qmul values), it might be possible to get some extra speed. Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk	14 years ago
Diego Biurrun	ba87f0801d	Remove explicit filename from Doxygen @file commands. Passing an explicit filename to this command is only necessary if the documentation in the @file block refers to a file different from the one the block resides in. Originally committed as revision 22921 to svn://svn.ffmpeg.org/ffmpeg/trunk	15 years ago
Måns Rullgård	4693b031a3	Move H264 dsputil functions into their own struct This moves the H264-specific functions from DSPContext to the new H264DSPContext. The code is made conditional on CONFIG_H264DSP which is set by the codecs requiring it. The qpel and chroma MC functions are not moved as these are used by non-h264 code. Originally committed as revision 22565 to svn://svn.ffmpeg.org/ffmpeg/trunk	15 years ago

48 Commits (50471f96c4a68874575ab21f799c5999ed920838)