FFmpeg

Mirror of https://git.ffmpeg.org/ffmpeg.git https://ffmpeg.org/

Martin Storsjö cad42fadcd aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32 This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: vp9_inv_dct_dct_16x16_sub16_add_neon: 1373.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 8089.0 By skipping individual 8x16 or 8x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_16x16_sub2_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub8_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 1372.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5190.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub8_add_neon: 5183.1 vp9_inv_dct_dct_32x32_sub12_add_neon: 6161.5 vp9_inv_dct_dct_32x32_sub16_add_neon: 6155.5 vp9_inv_dct_dct_32x32_sub20_add_neon: 7136.3 vp9_inv_dct_dct_32x32_sub24_add_neon: 7128.4 vp9_inv_dct_dct_32x32_sub28_add_neon: 8098.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8098.8 I.e. in general a very minor overhead for the full subpartition case due to the additional cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. Signed-off-by: Martin Storsjö <martin@martin.st>		8 years ago
..
Makefile	aarch64: vp9: Implement NEON loop filters	8 years ago
asm-offsets.h	arm64: port synth_filter_float_neon from arm	9 years ago
cabac.h	aarch64: get_cabac inline asm	11 years ago
dcadsp_init.c	dca: remove unused decode_hf function and quant_d tables	9 years ago
dcadsp_neon.S	dca: remove unused decode_hf function and quant_d tables	9 years ago
fft_init_aarch64.c	fft: Split MDCT bits off from FFT	9 years ago
fft_neon.S	aarch64: Use .data.rel.ro for const data with relocations	10 years ago
fmtconvert_init.c	arm64: int32_to_float_fmul neon asm	9 years ago
fmtconvert_neon.S	arm64: int32_to_float_fmul neon asm	9 years ago
h264chroma_init_aarch64.c	h264chroma: Change type of stride parameters to ptrdiff_t	8 years ago
h264cmc_neon.S	h264chroma: Change type of stride parameters to ptrdiff_t	8 years ago
h264dsp_init_aarch64.c	aarch64: h264 (bi)weight NEON optimizations	11 years ago
h264dsp_neon.S	aarch64: h264 (bi)weight NEON optimizations	11 years ago
h264idct_neon.S	aarch64: h264idct: Use the offset parameter to movrel	8 years ago
h264pred_init.c	h264: aarch64: intra prediction optimisations	9 years ago
h264pred_neon.S	h264: aarch64: intra prediction optimisations	9 years ago
h264qpel_init_aarch64.c	arm64: constify src in h264qpel dsp function definitions	10 years ago
h264qpel_neon.S	aarch64: h264 qpel NEON optimizations	11 years ago
hpeldsp_init_aarch64.c	aarch64: hpeldsp NEON optimizations	11 years ago
hpeldsp_neon.S	aarch64: hpeldsp NEON optimizations	11 years ago
imdct15_init.c	opus: Factor out imdct15 into a standalone component	10 years ago
imdct15_neon.S	opus: Factor out imdct15 into a standalone component	10 years ago
mdct_init.c	fft: Split MDCT bits off from FFT	9 years ago
mdct_neon.S	aarch64: NEON float (i)MDCT	11 years ago
mpegaudiodsp_init.c	mpegaudiodsp: aarch64: Adjust function prototype after `2caa93b813`	8 years ago
mpegaudiodsp_neon.S	mpegaudiodsp: Change type of array stride parameters to ptrdiff_t	8 years ago
neon.S	aarch64: Make transpose_4x4H do a regular transpose	9 years ago
neontest.c	lavc: add clobber tests for the new encoding/decoding API	8 years ago
rv40dsp_init_aarch64.c	h264chroma: Change type of stride parameters to ptrdiff_t	8 years ago
synth_filter_neon.S	arm64: port synth_filter_float_neon from arm	9 years ago
vc1dsp_init_aarch64.c	h264chroma: Change type of stride parameters to ptrdiff_t	8 years ago
videodsp.S	aarch64: implement videodsp.prefetch	11 years ago
videodsp_init.c	aarch64: implement videodsp.prefetch	11 years ago
vorbisdsp_init.c	aarch64: NEON vorbis_inverse_coupling	11 years ago
vorbisdsp_neon.S	aarch64: NEON vorbis_inverse_coupling	11 years ago
vp9dsp_init_aarch64.c	aarch64: vp9: Implement NEON loop filters	8 years ago
vp9itxfm_neon.S	aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32	8 years ago
vp9lpf_neon.S	aarch64: vp9: loop filter: replace 'orr; cbn?z' with 'adds; b.{eq,ne};	8 years ago
vp9mc_neon.S	aarch64: vp9: Add NEON optimizations of VP9 MC functions	8 years ago