FFmpeg

Commit Graph

Author	SHA1	Message	Date
Anton Khirnov	683da86aab	audiodsp: reorder arguments for vector_clipf This will make the x86 asm simpler. ARM conversion by Martin Storsjö <martin@martin.st> and Janne Grunau <janne-libav@jannau.net>	9 years ago
Anton Khirnov	75d98e30af	audiodsp/x86: clear the high bits of the order parameter on 64bit Also change shl to add, since it can be faster on some CPUs. CC: libav-stable@libav.org	9 years ago
Anton Khirnov	1d6c76e11f	audiodsp/x86: fix ff_vector_clip_int32_sse2 This version, which is the only one doing two processing cycles per loop iteration, computes the load/store indices incorrectly for the second cycle. CC: libav-stable@libav.org	9 years ago
Diego Biurrun	de452e5037	pixblockdsp: Change type of stride parameters to ptrdiff_t This avoids SIMD-optimized functions having to sign-extend their line size argument manually to be able to do pointer arithmetic. Also adjust parameter names to be "stride" everywhere.	9 years ago
Diego Biurrun	721d57e608	vp56: Separate VP5 and VP6 dsp initialization VP5 has no arch-specific optimizations (nor will it get some in the future), so it makes no sense to try to share dsp init code with VP6.	9 years ago
Diego Biurrun	3fd22538bc	prores: Change type of stride parameters to ptrdiff_t This avoids SIMD-optimized functions having to sign-extend their line size argument manually to be able to do pointer arithmetic. Also adjust parameter names to be "linesize" everywhere.	9 years ago
Diego Biurrun	f81be06cf6	cavs: Change type of stride parameters to ptrdiff_t ptrdiff_t is the correct type for array strides and similar.	9 years ago
Diego Biurrun	802727b538	vp8: Update some assembly comments left unchanged in `bd66f073fe`	9 years ago
Diego Biurrun	d9d26a3674	vp56: Change type of stride parameters to ptrdiff_t This avoids SIMD-optimized functions having to sign-extend their line size argument manually to be able to do pointer arithmetic.	9 years ago
Diego Biurrun	6892df9294	vp3: Change type of stride parameters to ptrdiff_t This avoids SIMD-optimized functions having to sign-extend their stride argument manually to be able to do pointer arithmetic. Also adjust parameter names to be "stride" everywhere.	9 years ago
Diego Biurrun	e2b9993558	simple_idct: x86: Drop disabled IDCT implementation This gem has been disabled since 2001.	9 years ago
James Almer	d950279cbf	avcodec/ttadsp: cosmetics Clean some header includes and use the same naming scheme as in ttaencdsp Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Ronald S. Bultje	9790b44a89	vp9mc/x86: sse2 MC assembly. Also a slight change to the ssse3 code, which prevents a theoretical overflow in the sharp filter. Signed-off-by: Anton Khirnov <anton@khirnov.net>	9 years ago
James Almer	67922b4ee4	vp9mc/x86: add AVX and AVX2 MC Roughly 25% faster MC than ssse3 for blocksizes 32 and 64. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	9 years ago
Clément Bœsch	3cda179f18	vp9mc/x86: rename ff_* to ff_vp9_* Signed-off-by: Anton Khirnov <anton@khirnov.net>	9 years ago
James Almer	8be8444d01	vp9mc/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext pavgb is an sse integer instruction, so the mmxext flag is enough Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	9 years ago
Clément Bœsch	6ab642d69d	vp9mc/x86: simplify a few inits. Signed-off-by: Anton Khirnov <anton@khirnov.net>	9 years ago
Ronald S. Bultje	3a09494939	vp9mc/x86: add 16px functions (64bit only). Signed-off-by: Anton Khirnov <anton@khirnov.net>	9 years ago
Anton Khirnov	89466de4ae	vp9/x86: rename vp9dsp to vp9mc It only contains the MC SIMD, other SIMD will go into different files.	9 years ago
Christophe Gisquet	3c504bc359	x86: deduplicate some constants Signed-off-by: Anton Khirnov <anton@khirnov.net>	9 years ago
James Almer	efc9d5c4bc	x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4} Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Ronald S. Bultje	a4edaa0270	vp9: add mxext versions of the single-block (w=8,npx=8) h/v loopfilters. Each takes about 0.1% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).	9 years ago
Ronald S. Bultje	7ca422bb1b	vp9: add mxext versions of the single-block (w=4,npx=8) h/v loopfilters. Each takes about 0.5% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).	9 years ago
Ronald S. Bultje	726501a34e	vp9: add 32x32 idct AVX2 implementation. About 1.8x speedup compared to AVX version for full IDCT. Other sub-IDCT scenarios also see speedups. Full --bench output for idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles): nop: 16.5 vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4 vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0 vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4 vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1 vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2 vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8 vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2 vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9 vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5 vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2 vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1 vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1 vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7 vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7 vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1 vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4 vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8 vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5 vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0 vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4 vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7 vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7 vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4 vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7 vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5 vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6 vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6 vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9 vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6 vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0	9 years ago
James Almer	7a15cf42ee	x86/diracdsp: make ff_put_signed_rect_clamped_10_sse4 work on x86_32 Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Diego Biurrun	d06dfaa5cb	x86: huffyuv: Use EXTERNAL_SSSE3_FAST convenience macro where appropriate	9 years ago
Diego Biurrun	4efab89332	x86: Use _FAST/_SLOW CPU feature detection macros where appropriate	9 years ago
Diego Biurrun	0a39c9ac0b	x86: hpeldsp: Don't check for bitexact flag when initializing VP3-specific code That code is only ever initialized with that flag set.	9 years ago
Diego Biurrun	95c1df929b	x86: hpeldsp: Drop unused function parameters	9 years ago
Diego Biurrun	c3e83ad3b7	x86: hpeldsp: Use EXTERNAL_SSE2_FAST where appropriate	9 years ago
Diego Biurrun	1dfc3cf89d	x86: hpeldsp: Split off VP3-specific bits into a separate file	9 years ago
James Almer	fca3c3b619	hevc: Add AVX2 DC IDCT Originally written by Pierre Edouard Lepere <pierre-edouard.lepere@insa-rennes.fr>. Integrated to Libav by Josh de Kock <josh@itanimul.li>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>	9 years ago
Rostislav Pehlivanov	df1dc52195	diracdsp_init: add missing ARCH_X86_64 check That SIMD is still x86_64 only for now. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	9 years ago
Rostislav Pehlivanov	bd61f3c6bf	diracdsp: add SIMD for the 10 bit version of put_signed_rect_clamped Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>	9 years ago
Rostislav Pehlivanov	80721cc1ff	diracdsp: add dequantization SIMD Currently unused, to be used in the following commits. Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>	9 years ago
Ronald S. Bultje	f0a2b6249b	vp9: add 16x16 idct avx2 (8-bit). checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4	9 years ago
James Almer	645489cf90	x86/dcadsp: optimize lfe_fir0_float_fma3 on x86_32 About 10% faster. Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	293484fa5e	avcodec: add missing xmm/neon clobber test wrappers for the new decode API Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Matthieu Bouron	9eb3da2f99	asm: FF_-prefix internal macros used in inline assembly See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.	9 years ago
Clément Bœsch	4a081f224e	libavcodec: fix constness in clobber test avcodec_open2() wrappers Signed-off-by: Martin Storsjö <martin@martin.st>	9 years ago
Anton Khirnov	9df889a5f1	h264: rename h264.[ch] to h264dec.[ch] This is more consistent with the naming of other decoders.	9 years ago
Martin Storsjö	f1a9eee41c	x86: Add missing movsxd for the int stride parameter Signed-off-by: Martin Storsjö <martin@martin.st>	9 years ago
James Almer	ede4ec1f8f	x86/aacpsdsp: optimize add_squares loop Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
James Almer	82dbfccaf0	x86/aacdec: use HADDPS macro Signed-off-by: James Almer <jamrial@gmail.com>	9 years ago
Diego Biurrun	1e9c5bf4c1	asm: FF_-prefix internal macros used in inline assembly These warnings conflict with system macros on Solaris, producing truckloads of warnings about macro redefinition.	9 years ago
Diego Biurrun	dc40a70c57	Drop unnecessary libavutil/x86/asm.h #includes	9 years ago
Diego Biurrun	a6a750c7ef	tests: Move all test programs to a subdirectory	9 years ago
Christophe Gisquet	9630b3fc06	x86: lossless audio: SSE4 madd 32bits The unique user so far is wmalossless 24bits. The few samples tested show an order of 8, so more unrolling or an avx2 version do not make sense. Timings: 68 -> 49 cycles Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	9 years ago
Vittorio Giovara	41ed7ab45f	cosmetics: Fix spelling mistakes Signed-off-by: Diego Biurrun <diego@biurrun.de>	9 years ago
Diego Biurrun	01621202aa	build: miscellaneous cosmetics Restore alphabetical order in lists, break overly long lines, do some prettyprinting, add some explanatory section comments, group parts together that belong together logically.	9 years ago

... 3 4 5 6 7 ...

2428 Commits (09f0429b9961ea77a60b07afb62082e5565decd4)