FFmpeg

Mirror of https://git.ffmpeg.org/ffmpeg.git https://ffmpeg.org/

Martin Storsjö 1e5d87eec3 arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter This work is sponsored by, and copyright, Google. This is pretty much similar to the 8 bpp version, but in some senses simpler. All input pixels are 16 bits, and all intermediates also fit in 16 bits, so there's no lengthening/narrowing in the filter at all. For the full 16 pixel wide filter, we can only process 4 pixels at a time (using an implementation very much similar to the one for 8 bpp), but we can do 8 pixels at a time for the 4 and 8 pixel wide filters with a different implementation of the core filter. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_loop_filter_h_4_8_10bpp_neon: 1.83 2.16 1.40 2.09 vp9_loop_filter_h_8_8_10bpp_neon: 1.39 1.67 1.24 1.70 vp9_loop_filter_h_16_8_10bpp_neon: 1.56 1.47 1.10 1.81 vp9_loop_filter_h_16_16_10bpp_neon: 1.94 1.69 1.33 2.24 vp9_loop_filter_mix2_h_44_16_10bpp_neon: 2.01 2.27 1.67 2.39 vp9_loop_filter_mix2_h_48_16_10bpp_neon: 1.84 2.06 1.45 2.19 vp9_loop_filter_mix2_h_84_16_10bpp_neon: 1.89 2.20 1.47 2.29 vp9_loop_filter_mix2_h_88_16_10bpp_neon: 1.69 2.12 1.47 2.08 vp9_loop_filter_mix2_v_44_16_10bpp_neon: 3.16 3.98 2.50 4.05 vp9_loop_filter_mix2_v_48_16_10bpp_neon: 2.84 3.64 2.25 3.77 vp9_loop_filter_mix2_v_84_16_10bpp_neon: 2.65 3.45 2.16 3.54 vp9_loop_filter_mix2_v_88_16_10bpp_neon: 2.55 3.30 2.16 3.55 vp9_loop_filter_v_4_8_10bpp_neon: 2.85 3.97 2.24 3.68 vp9_loop_filter_v_8_8_10bpp_neon: 2.27 3.19 1.96 3.08 vp9_loop_filter_v_16_8_10bpp_neon: 3.42 2.74 2.26 4.40 vp9_loop_filter_v_16_16_10bpp_neon: 2.86 2.44 1.93 3.88 The speedup vs C code measured in checkasm is around 1.1-4x. These numbers are quite inconclusive though, since the checkasm test runs multiple filterings on top of each other, so later rounds might end up with different codepaths (different decisions on which filter to apply, based on input pixel differences). Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 2-4x. Signed-off-by: Martin Storsjö <martin@martin.st>		8 years ago
..
Makefile	arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter	8 years ago
aac.h	…
aacpsdsp_init_arm.c	…
aacpsdsp_neon.S	…
ac3dsp_arm.S	…
ac3dsp_armv6.S	…
ac3dsp_init_arm.c	…
ac3dsp_neon.S	…
asm-offsets.h	…
audiodsp_arm.h	…
audiodsp_init_arm.c	…
audiodsp_init_neon.c	…
audiodsp_neon.S	…
blockdsp_arm.h	blockdsp: remove high bitdepth parameter	9 years ago
blockdsp_init_arm.c	blockdsp: remove high bitdepth parameter	9 years ago
blockdsp_init_neon.c	blockdsp: reindent after parameter removal	9 years ago
blockdsp_neon.S	…
cabac.h	…
dca.h	avcodec/dca: remove old decoder	9 years ago
fft_fixed_init_arm.c	Merge commit '97aec6e75ef36ed0402653519daa8e1fc8ddb555'	9 years ago
fft_fixed_neon.S	…
fft_init_arm.c	Merge commit '4c297249ac0f513a610a62691ce96d6b62f65b94'	9 years ago
fft_neon.S	…
fft_vfp.S	…
flacdsp_arm.S	…
flacdsp_init_arm.c	lavc/flac: Fix encoding and decoding with high lpc.	10 years ago
fmtconvert_init_arm.c	Merge commit '90b1b9350c0a97c4065ae9054b83e57f48a0de1f'	9 years ago
fmtconvert_neon.S	Merge commit '90b1b9350c0a97c4065ae9054b83e57f48a0de1f'	9 years ago
fmtconvert_vfp.S	…
g722dsp_init_arm.c	Merge commit '702458538d4e52809bcef460d39baabf061b16b5'	10 years ago
g722dsp_neon.S	Merge commit '702458538d4e52809bcef460d39baabf061b16b5'	10 years ago
h264chroma_init_arm.c	…
h264cmc_neon.S	avcodec: fix vc1dsp dependencies	8 years ago
h264dsp_init_arm.c	…
h264dsp_neon.S	…
h264idct_neon.S	…
h264pred_init_arm.c	Merge commit '256ef19844892c6cf8e0386e3287bae970ec6320'	9 years ago
h264pred_neon.S	…
h264qpel_init_arm.c	…
h264qpel_neon.S	…
hevcdsp_arm.h	hevcdsp: fix compilation for arm and aarch64	10 years ago
hevcdsp_deblock_neon.S	hevcdsp: HEVC deblocking ARM NEON register clobber fix	10 years ago
hevcdsp_idct_neon.S	avcodec/arm/hevcdsp_idct_neon: drop ".code 32"	10 years ago
hevcdsp_init_arm.c	hevcdsp: fix compilation for arm and aarch64	10 years ago
hevcdsp_init_neon.c	hevcdsp: fix compilation for arm and aarch64	10 years ago
hevcdsp_qpel_neon.S	avcodec/hevcdsp: ARM NEON optimized qpel functions	10 years ago
hpeldsp_arm.S	…
hpeldsp_arm.h	…
hpeldsp_armv6.S	…
hpeldsp_init_arm.c	…
hpeldsp_init_armv6.c	…
hpeldsp_init_neon.c	…
hpeldsp_neon.S	…
idct.h	…
idctdsp_arm.S	…
idctdsp_arm.h	…
idctdsp_armv6.S	…
idctdsp_init_arm.c	Merge commit '7c6eb0a1b7bf1aac7f033a7ec6d8cacc3b5c2615'	9 years ago
idctdsp_init_armv5te.c	…
idctdsp_init_armv6.c	Merge commit '7c6eb0a1b7bf1aac7f033a7ec6d8cacc3b5c2615'	9 years ago
idctdsp_init_neon.c	…
idctdsp_neon.S	…
int_neon.S	…
jrevdct_arm.S	…
lossless_audiodsp_init_arm.c	…
lossless_audiodsp_neon.S	…
mathops.h	…
mdct_fixed_neon.S	…
mdct_neon.S	…
mdct_vfp.S	…
me_cmp_armv6.S	…
me_cmp_init_arm.c	…
mlpdsp_armv5te.S	…
mlpdsp_armv6.S	Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'	9 years ago
mlpdsp_init_arm.c	…
mpegaudiodsp_fixed_armv6.S	…
mpegaudiodsp_init_arm.c	…
mpegvideo_arm.c	…
mpegvideo_arm.h	…
mpegvideo_armv5te.c	Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'	9 years ago
mpegvideo_armv5te_s.S	…
mpegvideo_neon.S	…
mpegvideoencdsp_armv6.S	…
mpegvideoencdsp_init_arm.c	…
neon.S	…
neontest.c	avcodec: fix arguments on xmm/neon clobber test wrappers	8 years ago
pixblockdsp_armv6.S	…
pixblockdsp_init_arm.c	…
rdft_init_arm.c	arm/rdft_init: fix license header	9 years ago
rdft_neon.S	…
rv34dsp_init_arm.c	…
rv34dsp_neon.S	…
rv40dsp_init_arm.c	…
rv40dsp_neon.S	…
sbrdsp_init_arm.c	…
sbrdsp_neon.S	…
simple_idct_arm.S	Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'	9 years ago
simple_idct_armv5te.S	…
simple_idct_armv6.S	…
simple_idct_neon.S	…
startcode.h	…
startcode_armv6.S	…
synth_filter_init_arm.c	avcodec/synth_filter: split off remaining code from dcadec files	9 years ago
synth_filter_neon.S	…
synth_filter_vfp.S	…
vc1dsp.h	…
vc1dsp_init_arm.c	…
vc1dsp_init_neon.c	…
vc1dsp_neon.S	…
videodsp_arm.h	…
videodsp_armv5te.S	arm: use a local label instead of the function symbol in ff_prefetch_arm	9 years ago
videodsp_init_arm.c	…
videodsp_init_armv5te.c	…
vorbisdsp_init_arm.c	…
vorbisdsp_neon.S	…
vp3dsp_init_arm.c	…
vp3dsp_neon.S	…
vp6dsp_init_arm.c	…
vp6dsp_neon.S	…
vp8.h	…
vp8_armv6.S	…
vp8dsp.h	…
vp8dsp_armv6.S	Merge commit '5f74bd31a9bd1ac7655103b11743c12d38e0419f'	8 years ago
vp8dsp_init_arm.c	…
vp8dsp_init_armv6.c	…
vp8dsp_init_neon.c	…
vp8dsp_neon.S	Merge commit 'e8b96a77010dd62624c3c65c357d7ae3b397ceaa'	8 years ago
vp9dsp_init.h	arm: Add NEON optimizations for 10 and 12 bit vp9 MC	8 years ago
vp9dsp_init_10bpp_arm.c	arm: Add NEON optimizations for 10 and 12 bit vp9 MC	8 years ago
vp9dsp_init_12bpp_arm.c	arm: Add NEON optimizations for 10 and 12 bit vp9 MC	8 years ago
vp9dsp_init_16bpp_arm_template.c	arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter	8 years ago
vp9dsp_init_arm.c	arm: Add NEON optimizations for 10 and 12 bit vp9 MC	8 years ago
vp9itxfm_16bpp_neon.S	arm: Add NEON optimizations for 10 and 12 bit vp9 itxfm	8 years ago
vp9itxfm_neon.S	arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32	8 years ago
vp9lpf_16bpp_neon.S	arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter	8 years ago
vp9lpf_neon.S	arm: vp9: Add NEON loop filters	8 years ago
vp9mc_16bpp_neon.S	arm: Add NEON optimizations for 10 and 12 bit vp9 MC	8 years ago
vp9mc_neon.S	arm: vp9mc: Fix vertical alignment of operands	8 years ago
vp56_arith.h	…