FFmpeg

Commit Graph

Author	SHA1	Message	Date
James Almer	497a4b554c	x86/aacpsdsp: fix output of ff_ps_stereo_interpolate_ipdopd_sse3 The fate-aac-al_sbr_ps_04_ur test did not detect this mistake.	8 years ago
Ilia Valiakhmetov	73d9a9a6af	libavcodec/vp9: ipred_dl_32x32_16 avx2 implementation vp9_diag_downleft_32x32_8bpp_c: 580.2 vp9_diag_downleft_32x32_8bpp_sse2: 75.6 vp9_diag_downleft_32x32_8bpp_ssse3: 73.7 vp9_diag_downleft_32x32_8bpp_avx: 72.7 vp9_diag_downleft_32x32_10bpp_c: 1101.2 vp9_diag_downleft_32x32_10bpp_sse2: 145.4 vp9_diag_downleft_32x32_10bpp_ssse3: 137.5 vp9_diag_downleft_32x32_10bpp_avx: 134.8 vp9_diag_downleft_32x32_10bpp_avx2: 94.0 vp9_diag_downleft_32x32_12bpp_c: 1108.5 vp9_diag_downleft_32x32_12bpp_sse2: 145.5 vp9_diag_downleft_32x32_12bpp_ssse3: 137.3 vp9_diag_downleft_32x32_12bpp_avx: 135.2 vp9_diag_downleft_32x32_12bpp_avx2: 94.0 ~30% faster than avx implementation Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	8 years ago
James Almer	933dd62288	x86/aacpsdsp: optimize ff_ps_mul_pair_single_sse ~2% faster.	8 years ago
James Almer	be3809a521	x86/aacpsdsp: optimize ff_ps_stereo_interpolate_sse3 Move the unpacking outside of the loop. 5% to 10% faster. Suggested-by: ubitux Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
James Almer	b5a0971ff0	x86/aacps: add ff_ps_stereo_interpolate_ipdopd_sse3() About 2x faster than the c version. Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
James Darnley	0dea0114fb	avcodec/x86/idctdsp_init: reindent	8 years ago
James Darnley	8e89f6fd37	avcodec/x86: move simple_idct to external assembly	8 years ago
Clément Bœsch	584366a436	lavc/mpegvideoenc: reformat inv_zigzag_direct16 so the zigzag pattern is visible	8 years ago
James Darnley	7aa90b4e94	avcodec/h264: add sse2 versions of previous idct functions Kaby Lake Pentium: - ff_h264_idct_add_8_sse2: ~1.18x faster than mmxext - ff_h264_idct_dc_add_8_sse2: ~1.07x faster than mmxext	8 years ago
James Darnley	27460dfebc	avcodec/h264: add avx 8-bit h264_idct_dc_add Haswell: - 1.02x faster (405±0.7 vs. 397±0.8 decicycles) compared with mmxext Skylake-U: - 1.06x faster (498±1.8 vs. 470±1.3 decicycles) compared with mmxext	8 years ago
James Darnley	f61d454ca1	avcodec/h264: add avx 8-bit h264_idct_add Haswell: - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext Skylake-U: - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext	8 years ago
James Darnley	b5325c6711	avcodec/h264: use some 3 operand forms	8 years ago
James Darnley	060ba9e5e3	avcodec/h264: change RETs into REP_RETs where appropriate	8 years ago
Michael Niedermayer	fa8fd0808f	avcodec/x86/vc1dsp_init: Fix build failure with --disable-optimizations and clang compilers doing DCE at -O0 do not necessarily understand "complex" boolean expressions Build succeeds with this change, this was the only failure Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Ronald S. Bultje	83ae7e6350	x86/idctdsp_init: reindent.	8 years ago
Ronald S. Bultje	e0c205677f	x86/simple_idct: add explicit sse2 simple_idct_put/add versions. These use the mmx IDCT, but sse2 put/add_pixels_clamped implementations. This way we don't need to use the ff_put/add_pixels_clamped function pointers.	8 years ago
Ronald S. Bultje	2f0591cfa3	cavs: add a sse2 idct implementation. This makes using the function pointer ff_add_pixels_clamped() unnecessary, since we always know what the best implementation is at compile-time.	8 years ago
Ronald S. Bultje	c9d98c5649	cavs: convert idct from inline asm to yasm.	8 years ago
Ronald S. Bultje	b51d7d89f8	x86/xvididct: remove use of ff_put/add_pixels_clamped function pointer. Since there's separate SSE2 implementations of xvid_idct_put/add, this patch has no practical impact on performance.	8 years ago
James Almer	6171f178e7	x86/hevc_add_res: merge last remaining changes from `3d65359832` See https://lists.libav.org/pipermail/libav-devel/2016-October/079829.html	8 years ago
Ronald S. Bultje	f8c019944d	vp9: re-split the decoder/format/dsp interface header files. The advantage here is that the internal software decoder interface is not exposed to the DSP functions or the hardware accelerations.	8 years ago
Clément Bœsch	1c9f4b5078	lavc/vp9: split into vp9{block,data,mvs} This is following Libav layout to ease merges.	8 years ago
Michael Niedermayer	73fb40dc87	avcodec/x86/idctdsp: Remove duplicate include Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
James Almer	ac42f08099	x86/hevc_add_res: merge missing changes from `3d65359832` Unrolling the loops triplicates the size of the assembled output while not generating any gain in performance.	8 years ago
Clément Bœsch	40ac226014	lavc/x86/hevc: rename hevc_res_add to hevc_add_res This will simplify incoming merge.	8 years ago
Diego Biurrun	dcc39ee10e	lavc: Remove deprecated XvMC support hacks Deprecated in 11/2013.	8 years ago
James Almer	30cadfe071	avcodec/lossless_videodsp: use ptrdiff_t for length parameters Signed-off-by: James Almer <jamrial@gmail.com>	8 years ago
Clément Bœsch	af607b7e07	lavc/huffyuvdsp: only transmit the pix_fmt instead of the whole avctx Only the pixel format is required in that init function. This will also simplify the incoming merge.	8 years ago
James Almer	aee046a895	x86/audiodsp: remove an unnecessary movss	8 years ago
Ilia	2f3d10a01a	avcodec/vp9: avx2 implementation of ipred_dl_16x16_16 vp9_diag_downleft_16x16_10bpp_c: 263.0 vp9_diag_downleft_16x16_10bpp_sse2: 44.7 vp9_diag_downleft_16x16_10bpp_ssse3: 32.5 vp9_diag_downleft_16x16_10bpp_avx: 31.9 vp9_diag_downleft_16x16_10bpp_avx2: 25.7 vp9_diag_downleft_16x16_12bpp_c: 264.7 vp9_diag_downleft_16x16_12bpp_sse2: 44.4 vp9_diag_downleft_16x16_12bpp_ssse3: 32.0 vp9_diag_downleft_16x16_12bpp_avx: 32.4 vp9_diag_downleft_16x16_12bpp_avx2: 25.5 Benchmarked with 10000 runs Signed-off-by: Ilia <zakne0ne@gmail.com> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	8 years ago
Mirage Abeysekara	5eb4f95bef	h264pred: added AVX2 implementation for tm_vp8 16x16. checkasm --bench results with 5000 runs pred16x16_tm_vp8_c: 302.8 pred16x16_tm_vp8_mmx: 101.4 pred16x16_tm_vp8_mmxext: 95.5 pred16x16_tm_vp8_sse2: 95.1 pred16x16_tm_vp8_avx2: 38.2 Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	8 years ago
Diego Biurrun	681a86aba6	x86: fft: Port to cpuflags	8 years ago
Diego Biurrun	e9bb77fb10	x86: h264: Simplify DEQUANT macro with cpuflags	8 years ago
Diego Biurrun	307eb1a8ee	x86: vp8dsp: port FILTER_BILINEAR macro to cpuflags	8 years ago
Diego Biurrun	994c4bc107	x86util: Port all macros to cpuflags Also do some small cosmetic changes: Drop pointless _MMX suffix from ABSD2 macro name, drop pointless check for MMX support, we always assume MMX is available in our SIMD code, fix spelling.	8 years ago
Michael Niedermayer	835d9f299c	avcodec/x86/cavsdsp: Put MMX code under mmx check Without this the FPU state becomes trashed and causes mysterious fate failures with cpuflags=0 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	8 years ago
Diego Biurrun	6eef263aca	x86: Merge align directives into SECTION_RODATA declarations where possible	8 years ago
Diego Biurrun	39e208f4d4	build: Generalize yasm/nasm-related variable names None of them are specific to the YASM assembler.	8 years ago
Diego Biurrun	fde7ee8710	x86: hevc: Add missing colons after assembly labels This fixes several warnings of the sort warning: label alone on a line without a colon might be in error	8 years ago
James Darnley	33de0fee2c	avcodec/h264: enable sse2 chroma deblock/loop filter functions Between 1.00 and 1.16 times faster on Intel Yorkfield Core 2 Quad. Between 1.11 and 1.39 times faster on Intel Kaby Lake Pentium.	8 years ago
James Darnley	cd893b9307	avcodec/h264: add avx 8-bit 4:2:2 chroma h intra deblock/loop filter ~1.37x faster (147 vs. 108 cycles) compared to mmxext function	8 years ago
James Darnley	0e16b3e2be	avcodec/h264: add avx 8-bit 4:2:0 chroma h intra deblock/loop filter ~1.10x faster (69 vs. 63 cycles) compared to mmxext function	8 years ago
James Darnley	987ffe4b8d	avcodec/h264: add avx 8-bit chroma v intra deblock/loop filter ~1.14x faster (90 vs 78 cycles) compared with mmxext	8 years ago
James Darnley	88307b3eec	avcodec/h264: add avx 8-bit 4:2:2 chroma h deblock/loop filter ~1.21x faster (68 vs. 56 cycles) compared with mmxext function	8 years ago
James Darnley	ac096fc82d	avcodec/h264: add avx 8-bit 4:2:0 chroma h deblock/loop filter ~1.14x faster (93 vs. 81 cycles) compared with mmxext function	8 years ago
James Darnley	5c56758843	avcodec/h264: add avx 8-bit chroma v deblock/loop filter ~1.24x faster (101 vs. 81 cycles) compared with mmxext function	8 years ago
James Darnley	5336887867	avcodec/h264: sse2, avx h luma mbaff deblock/loop filter x86-64 only Yorkfield: - sse2: ~2.17x (434 vs. 200 cycles) Nehalem: - sse2: ~2.94x (409 vs. 139 cycles) Skylake: - sse2: ~3.10x (370 vs. 119 cycles) - avx: ~3.29x (370 vs. 112 cycles)	8 years ago
James Darnley	e18bc2114f	avcodec/h264: add named parameters to x86 function	8 years ago
James Darnley	9d815b7424	avcodec/x86: deduplicate PASS8ROWS macro	8 years ago
Diego Biurrun	7abdd026df	asm: Consistently uppercase SECTION markers	8 years ago

1 2 3 4 5 ...

2450 Commits (b84a2b91fdfc5c90a8f0afc97d87b02af0b0854e)