FFmpeg

Mirror of https://git.ffmpeg.org/ffmpeg.git https://ffmpeg.org/

History

Martin Storsjö 9f10cff610 aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter This work is sponsored by, and copyright, Google. This is similar to the arm version, but due to the larger registers on aarch64, we can do 8 pixels at a time for all filter sizes. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_loop_filter_h_4_8_10bpp_neon: 213.2 172.6 vp9_loop_filter_h_8_8_10bpp_neon: 281.2 244.2 vp9_loop_filter_h_16_8_10bpp_neon: 657.0 444.5 vp9_loop_filter_h_16_16_10bpp_neon: 1280.4 877.7 vp9_loop_filter_mix2_h_44_16_10bpp_neon: 397.7 358.0 vp9_loop_filter_mix2_h_48_16_10bpp_neon: 465.7 429.0 vp9_loop_filter_mix2_h_84_16_10bpp_neon: 465.7 428.0 vp9_loop_filter_mix2_h_88_16_10bpp_neon: 533.7 499.0 vp9_loop_filter_mix2_v_44_16_10bpp_neon: 271.5 244.0 vp9_loop_filter_mix2_v_48_16_10bpp_neon: 330.0 305.0 vp9_loop_filter_mix2_v_84_16_10bpp_neon: 329.0 306.0 vp9_loop_filter_mix2_v_88_16_10bpp_neon: 386.0 365.0 vp9_loop_filter_v_4_8_10bpp_neon: 150.0 115.2 vp9_loop_filter_v_8_8_10bpp_neon: 209.0 175.5 vp9_loop_filter_v_16_8_10bpp_neon: 492.7 345.2 vp9_loop_filter_v_16_16_10bpp_neon: 951.0 682.7 This is significantly faster than the ARM version in almost all cases except for the mix2 functions. Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 2-3x. Signed-off-by: Martin Storsjö <martin@martin.st>		8 years ago
..
Makefile	aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter	8 years ago
asm-offsets.h	Merge commit '705f5e5e155f6f280a360af220fc5b30cfcee702'	9 years ago
cabac.h	Merge commit 'dfe224f377be3e45758c69d881ca7874b82d647a'	11 years ago
fft_init_aarch64.c	Merge commit '97aec6e75ef36ed0402653519daa8e1fc8ddb555'	9 years ago
fft_neon.S	Merge commit '780cd20b00a69e26bbfffbb8eec16fbe999ea793'	10 years ago
fmtconvert_init.c	Merge commit 'a0fc780a2093784e8664f88205ee1b215e109cee'	9 years ago
fmtconvert_neon.S	Merge commit 'a0fc780a2093784e8664f88205ee1b215e109cee'	9 years ago
h264chroma_init_aarch64.c	…
h264cmc_neon.S	avcodec: fix vc1dsp dependencies	9 years ago
h264dsp_init_aarch64.c	lavc/aarch64: Do not use the neon horizontal chroma loop filter for H.264 4:2:2.	10 years ago
h264dsp_neon.S	…
h264idct_neon.S	aarch64: h264idct: Use the offset parameter to movrel	8 years ago
h264pred_init.c	Merge commit 'f56d8d8dd72b1ab52aa814c5a0fccabf8040ef68'	10 years ago
h264pred_neon.S	Merge commit 'f56d8d8dd72b1ab52aa814c5a0fccabf8040ef68'	10 years ago
h264qpel_init_aarch64.c	arm64: constify src in h264qpel dsp function definitions	10 years ago
h264qpel_neon.S	…
hpeldsp_init_aarch64.c	…
hpeldsp_neon.S	…
mdct_neon.S	Merge commit 'ee2bc5974fe64fd214f52574400ae01c85f4b855'	11 years ago
mpegaudiodsp_init.c	Merge commit '8f9fe6ae3461ce270bce6b7083fda5ec314cdad4'	11 years ago
mpegaudiodsp_neon.S	Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'	9 years ago
neon.S	Merge commit 'cdb1665f70def544ddab3e3ed3763ef99c8b3873'	9 years ago
neontest.c	avcodec: fix arguments on xmm/neon clobber test wrappers	9 years ago
rv40dsp_init_aarch64.c	…
synth_filter_init.c	avcodec/synth_filter: split off remaining code from dcadec files	9 years ago
synth_filter_neon.S	Merge commit '705f5e5e155f6f280a360af220fc5b30cfcee702'	9 years ago
vc1dsp_init_aarch64.c	…
videodsp.S	Merge commit 'd3789eeeed3423bd1ca9dc40030a2f7a21ea5332'	11 years ago
videodsp_init.c	Merge commit 'd3789eeeed3423bd1ca9dc40030a2f7a21ea5332'	11 years ago
vorbisdsp_init.c	Merge commit '3956a5e0ea46ed7e27ca888fe11c47986ad99261'	11 years ago
vorbisdsp_neon.S	Merge commit '3956a5e0ea46ed7e27ca888fe11c47986ad99261'	11 years ago
vp9dsp_init.h	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	8 years ago
vp9dsp_init_10bpp_aarch64.c	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	8 years ago
vp9dsp_init_12bpp_aarch64.c	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	8 years ago
vp9dsp_init_16bpp_aarch64_template.c	aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter	8 years ago
vp9dsp_init_aarch64.c	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	8 years ago
vp9itxfm_16bpp_neon.S	aarch64: Add NEON optimizations for 10 and 12 bit vp9 itxfm	8 years ago
vp9itxfm_neon.S	aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32	8 years ago
vp9lpf_16bpp_neon.S	aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter	8 years ago
vp9lpf_neon.S	aarch64: vp9: loop filter: replace 'orr; cbn?z' with 'adds; b.{eq,ne};	8 years ago
vp9mc_16bpp_neon.S	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	8 years ago
vp9mc_neon.S	aarch64: vp9mc: Fix a comment to refer to a register with the right name	8 years ago