FFmpeg

Mirror of https://git.ffmpeg.org/ffmpeg.git https://ffmpeg.org/

Martin Storsjö 9c8bc74c2b arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0 vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3 By skipping individual 4x16 or 4x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8 vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5 vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9 vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8 vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5 vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6 vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6 vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0 vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7 vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1 I.e. in general a very minor overhead for the full subpartition case due to the additional loads and cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. In common VP9 content in a few inspected clips, 70-90% of the non-dc-only 16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left 8x8 or 16x16 subpartitions respectively. Signed-off-by: Martin Storsjö <martin@martin.st>		8 years ago
..
Makefile	arm: vp9: Add NEON loop filters	8 years ago
aac.h	…
aacpsdsp_init_arm.c	…
aacpsdsp_neon.S	…
ac3dsp_arm.S	…
ac3dsp_armv6.S	…
ac3dsp_init_arm.c	…
ac3dsp_neon.S	…
apedsp_init_arm.c	…
apedsp_neon.S	…
asm-offsets.h	…
audiodsp_arm.h	…
audiodsp_init_arm.c	…
audiodsp_init_neon.c	audiodsp: reorder arguments for vector_clipf	8 years ago
audiodsp_neon.S	audiodsp: reorder arguments for vector_clipf	8 years ago
blockdsp_arm.h	blockdsp: drop the high_bit_depth parameter	8 years ago
blockdsp_init_arm.c	blockdsp: drop the high_bit_depth parameter	8 years ago
blockdsp_init_neon.c	blockdsp: drop the high_bit_depth parameter	8 years ago
blockdsp_neon.S	…
cabac.h	…
dca.h	…
dcadsp_init_arm.c	dca: remove unused decode_hf function and quant_d tables	9 years ago
dcadsp_neon.S	dca: remove unused decode_hf function and quant_d tables	9 years ago
dcadsp_vfp.S	…
fft_fixed_init_arm.c	fft: Split MDCT bits off from FFT	9 years ago
fft_fixed_neon.S	arm: Use .data.rel.ro for const data with relocations	10 years ago
fft_init_arm.c	fft: Split MDCT bits off from FFT	9 years ago
fft_neon.S	arm: Use .data.rel.ro for const data with relocations	10 years ago
fft_vfp.S	arm: Use .data.rel.ro for const data with relocations	10 years ago
flacdsp_arm.S	…
flacdsp_init_arm.c	…
fmtconvert_init_arm.c	arm: add ff_int32_to_float_fmul_array8_neon	9 years ago
fmtconvert_neon.S	arm: add ff_int32_to_float_fmul_array8_neon	9 years ago
fmtconvert_vfp.S	…
g722dsp_init_arm.c	g722: Add ARM NEON implementation for g722_apply_qmf()	10 years ago
g722dsp_neon.S	g722: Add ARM NEON implementation for g722_apply_qmf()	10 years ago
h264chroma_init_arm.c	h264chroma: Change type of stride parameters to ptrdiff_t	8 years ago
h264cmc_neon.S	h264chroma: Change type of stride parameters to ptrdiff_t	8 years ago
h264dsp_init_arm.c	h264: Move start code search functions into separate source files.	10 years ago
h264dsp_neon.S	…
h264idct_neon.S	…
h264pred_init_arm.c	h264: arm: use intra pred8x8 functions only for chroma_format_idc <= 1	9 years ago
h264pred_neon.S	…
h264qpel_init_arm.c	qpeldsp: Mark source pointer in qpel_mc_func function pointer const	10 years ago
h264qpel_neon.S	…
hpeldsp_arm.S	hpeldsp: arm: Update comments left behind in `25841dfe80`	8 years ago
hpeldsp_arm.h	…
hpeldsp_armv6.S	…
hpeldsp_init_arm.c	…
hpeldsp_init_armv6.c	…
hpeldsp_init_neon.c	…
hpeldsp_neon.S	…
idct.h	idct: Change type of array stride parameters to ptrdiff_t	8 years ago
idctdsp_arm.S	idct: Change type of array stride parameters to ptrdiff_t	8 years ago
idctdsp_arm.h	…
idctdsp_armv6.S	…
idctdsp_init_arm.c	idct: Change type of array stride parameters to ptrdiff_t	8 years ago
idctdsp_init_armv5te.c	idct: Move arm-specific declarations to a header in the arm directory	10 years ago
idctdsp_init_armv6.c	idct: Change type of array stride parameters to ptrdiff_t	8 years ago
idctdsp_init_neon.c	idct: Move arm-specific declarations to a header in the arm directory	10 years ago
idctdsp_neon.S	…
int_neon.S	…
jrevdct_arm.S	…
mathops.h	…
mdct_fixed_init_arm.c	fft: Split MDCT bits off from FFT	9 years ago
mdct_fixed_neon.S	…
mdct_init_arm.c	fft: Split MDCT bits off from FFT	9 years ago
mdct_neon.S	…
mdct_vfp.S	…
me_cmp_armv6.S	…
me_cmp_init_arm.c	motion_est: convert stride to ptrdiff_t	10 years ago
mlpdsp_armv5te.S	arm: mlpdsp: handle pic offset calculation in a macro	10 years ago
mlpdsp_armv6.S	cosmetics: Fix spelling mistakes	9 years ago
mlpdsp_init_arm.c	…
mpegaudiodsp_fixed_armv6.S	…
mpegaudiodsp_init_arm.c	…
mpegvideo_arm.c	mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes	10 years ago
mpegvideo_arm.h	mpegvideo: cosmetics: Lowercase ugly uppercase MPV_ function name prefixes	10 years ago
mpegvideo_armv5te.c	cosmetics: Fix spelling mistakes	9 years ago
mpegvideo_armv5te_s.S	…
mpegvideo_neon.S	…
mpegvideoencdsp_armv6.S	…
mpegvideoencdsp_init_arm.c	…
neon.S	…
neontest.c	lavc: add clobber tests for the new encoding/decoding API	8 years ago
pixblockdsp_armv6.S	…
pixblockdsp_init_arm.c	pixblockdsp: Change type of stride parameters to ptrdiff_t	8 years ago
rdft_init_arm.c	rdft: arm: Split RDFT initialization into a separate file	9 years ago
rdft_neon.S	…
rv34dsp_init_arm.c	…
rv34dsp_neon.S	…
rv40dsp_init_arm.c	qpeldsp: Mark source pointer in qpel_mc_func function pointer const	10 years ago
rv40dsp_neon.S	…
sbrdsp_init_arm.c	…
sbrdsp_neon.S	…
simple_idct_arm.S	cosmetics: Fix spelling mistakes	9 years ago
simple_idct_armv5te.S	simple_idct: arm: Drop disabled code variant	8 years ago
simple_idct_armv6.S	idct: Change type of array stride parameters to ptrdiff_t	8 years ago
simple_idct_neon.S	idct: Change type of array stride parameters to ptrdiff_t	8 years ago
startcode.h	h264: Move start code search functions into separate source files.	10 years ago
startcode_armv6.S	h264: Move start code search functions into separate source files.	10 years ago
synth_filter_neon.S	…
synth_filter_vfp.S	arm: cosmetics: Consistently use lowercase for shift operators	10 years ago
vc1dsp.h	…
vc1dsp_init_arm.c	vc-1: Add platform-specific start code search routine to VC1DSPContext.	10 years ago
vc1dsp_init_neon.c	h264chroma: Change type of stride parameters to ptrdiff_t	8 years ago
vc1dsp_neon.S	idct: Change type of array stride parameters to ptrdiff_t	8 years ago
videodsp_arm.h	…
videodsp_armv5te.S	arm: use a local label instead of the function symbol in ff_prefetch_arm	9 years ago
videodsp_init_arm.c	…
videodsp_init_armv5te.c	…
vorbisdsp_init_arm.c	…
vorbisdsp_neon.S	…
vp3dsp_init_arm.c	vp3: Change type of stride parameters to ptrdiff_t	8 years ago
vp3dsp_neon.S	…
vp6dsp_init_arm.c	vp56: Separate VP5 and VP6 dsp initialization	8 years ago
vp6dsp_neon.S	…
vp8.h	…
vp8_armv6.S	…
vp8dsp.h	…
vp8dsp_armv6.S	vp8: Update some assembly comments left unchanged in `bd66f073fe`	8 years ago
vp8dsp_init_arm.c	…
vp8dsp_init_armv6.c	…
vp8dsp_init_neon.c	…
vp8dsp_neon.S	arm: Fix a typo in a comment	8 years ago
vp9dsp_init_arm.c	arm: vp9: Add NEON loop filters	8 years ago
vp9itxfm_neon.S	arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32	8 years ago
vp9lpf_neon.S	arm: vp9: Add NEON loop filters	8 years ago
vp9mc_neon.S	arm: vp9mc: Use a different helper register for PIC loads	8 years ago
vp56_arith.h	…