FFmpeg

Commit Graph

Author	SHA1	Message	Date
Hubert Mazur	2537fdc510	sw_scale: Add specializations for hscale 16 to 19 Provide arm64 neon optimized implementations for hscale16To19 with filter sizes 4, 8 and X4. The tests and benchmarks run on AWS Graviton 2 instances. The results from a checkasm tool are shown below. hscale_16_to_19__fs_4_dstW_512_c: 6216.0 hscale_16_to_19__fs_4_dstW_512_neon: 2257.0 hscale_16_to_19__fs_8_dstW_512_c: 10417.7 hscale_16_to_19__fs_8_dstW_512_neon: 3112.5 hscale_16_to_19__fs_12_dstW_512_c: 14890.5 hscale_16_to_19__fs_12_dstW_512_neon: 3899.0 hscale_16_to_19__fs_16_dstW_512_c: 19006.5 hscale_16_to_19__fs_16_dstW_512_neon: 5341.2 hscale_16_to_19__fs_32_dstW_512_c: 36629.5 hscale_16_to_19__fs_32_dstW_512_neon: 9502.7 hscale_16_to_19__fs_40_dstW_512_c: 45477.5 hscale_16_to_19__fs_40_dstW_512_neon: 11552.0 (Note, the checkasm tests for these functions haven't been merged since they fail on x86.) Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Hubert Mazur	9ccf8c5bfc	sw_scale: Add specializations for hscale 16 to 15 Add arm64 neon implementations for hscale 16 to 15 with filter sizes 4, 8 and X4. The tests and benchmarks run on AWS Graviton 2 instances. The results from a checkasm tool are shown below. hscale_16_to_15__fs_4_dstW_512_c: 6703.5 hscale_16_to_15__fs_4_dstW_512_neon: 2298.0 hscale_16_to_15__fs_8_dstW_512_c: 10983.0 hscale_16_to_15__fs_8_dstW_512_neon: 3216.5 hscale_16_to_15__fs_12_dstW_512_c: 15526.0 hscale_16_to_15__fs_12_dstW_512_neon: 3993.0 hscale_16_to_15__fs_16_dstW_512_c: 20183.5 hscale_16_to_15__fs_16_dstW_512_neon: 5369.7 hscale_16_to_15__fs_32_dstW_512_c: 39315.2 hscale_16_to_15__fs_32_dstW_512_neon: 9511.2 hscale_16_to_15__fs_40_dstW_512_c: 48995.7 hscale_16_to_15__fs_40_dstW_512_neon: 11570.0 (Note, the checkasm tests for these functions haven't been merged since they fail on x86.) Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Hubert Mazur	1e9cfa5bb0	sw_scale: Add specializations for hscale 8 to 19 Add arm64 neon implementations for hscale 8 to 19 with filter sizes 4, 4X and 8. Both implementations are based on very similar ones dedicated to hscale 8 to 15. The major changes refer to saving the data - instead of writing the result as int16_t it is done with int32_t. These functions are heavily inspired on patches provided by J. Swinney and M. Storsjö for hscale8to15 which were slightly adapted for hscale8to19. The tests and benchmarks run on AWS Graviton 2 instances. The results from a checkasm tool shown below. hscale_8_to_19__fs_4_dstW_512_c: 5663.2 hscale_8_to_19__fs_4_dstW_512_neon: 1259.7 hscale_8_to_19__fs_8_dstW_512_c: 9306.0 hscale_8_to_19__fs_8_dstW_512_neon: 2020.2 hscale_8_to_19__fs_12_dstW_512_c: 12932.7 hscale_8_to_19__fs_12_dstW_512_neon: 2462.5 hscale_8_to_19__fs_16_dstW_512_c: 16844.2 hscale_8_to_19__fs_16_dstW_512_neon: 4671.2 hscale_8_to_19__fs_32_dstW_512_c: 32803.7 hscale_8_to_19__fs_32_dstW_512_neon: 5474.2 hscale_8_to_19__fs_40_dstW_512_c: 40948.0 hscale_8_to_19__fs_40_dstW_512_neon: 6669.7 Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Swinney, Jonathan	0d7caa5b09	swscale/aarch64: add vscale specializations This commit adds new code paths for vscale when filterSize is 2, 4, or 8. By using specialized code with unrolling to match the filterSize we can improve performance. On AWS c7g (Graviton 3, Neoverse V1) instances: before after yuv2yuvX_2_0_512_accurate_neon: 558.8 268.9 yuv2yuvX_4_0_512_accurate_neon: 637.5 434.9 yuv2yuvX_8_0_512_accurate_neon: 1144.8 806.2 yuv2yuvX_16_0_512_accurate_neon: 2080.5 1853.7 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Swinney, Jonathan	75ffca7eef	libswscale/aarch64: add another hscale specialization This specialization handles the case where filtersize is 4 mod 8, e.g. 12, 20, etc. Aarch64 was previously using the c function for this case. This implementation speeds up that case significantly. hscale_8_to_15__fs_12_dstW_512_c: 6234.1 hscale_8_to_15__fs_12_dstW_512_neon: 1505.6 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Swinney, Jonathan	0ea61725b1	swscale/aarch64: add hscale specializations This patch adds code to support specializations of the hscale function and adds a specialization for filterSize == 4. ff_hscale8to15_4_neon is a complete rewrite. Since the main bottleneck here is loading the data from src, this data is loaded a whole block ahead and stored back to the stack to be loaded again with ld4. This arranges the data for most efficient use of the vector instructions and removes the need for completion adds at the end. The number of iterations of the C per iteration of the assembly is increased from 4 to 8, but because of the prefetching, there must be a special section without prefetching when dstW < 16. This improves speed on Graviton 2 (Neoverse N1) dramatically in the case where previously fs=8 would have been required. before: hscale_8_to_15__fs_8_dstW_512_neon: 1962.8 after : hscale_8_to_15__fs_4_dstW_512_neon: 1220.9 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>	3 years ago
Andreas Rheinhardt	f3c197b129	Include attributes.h directly Some files currently rely on libavutil/cpu.h to include it for them; yet said file won't use include it any more after the currently deprecated functions are removed, so include attributes.h directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	4 years ago
Josh de Kock	718c8f9aa5	swscale: fix NEON hscale init The NEON hscale function only supports X8 filter sizes and should only be selected when these are being used. At the moment filterAlign is set to 8 but in the future when extra NEON assembly for specific sizes is added they will need to have checks here too. The immediate usecase for this change is making the hscale checkasm test easier and without NEON specific edge-cases (x86 already has these guards). Signed-off-by: Josh de Kock <josh@itanimul.li>	5 years ago
Clément Bœsch	c921f4f687	sws/aarch64: add ff_yuv2planeX_8_neon	9 years ago
Clément Bœsch	040598218f	sws/aarch64: restore ff_hscale_8_to_15_neon() Fix final scaling and required filter alignment. Pass FATE.	9 years ago
Clément Bœsch	eadaef2a63	sws/aarch64: disable ff_hscale_8_to_15_neon temporarly Looks broken.	9 years ago
Clément Bœsch	263eb76bdf	sws/aarch64: add ff_hscale_8_to_15_neon ./ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null - before: t:0.489726 avg:0.489883 max:0.491852 min:0.489482 after: t:0.256515 avg:0.256458 max:0.256999 min:0.253755	9 years ago
Janne Grunau	3956a5e0ea	aarch64: NEON vorbis_inverse_coupling From the ARMv7 NEON version. 16 times faster as the C version, overall more than 12% faster vorbis decoding on Apple's A7.	11 years ago
Diego Biurrun	c9f933b5b6	Add av_cold attributes to arch-specific init functions	12 years ago
Ronald S. Bultje	1768e43ceb	vorbisdsp: change block_size type from int to intptr_t. This saves one instruction in the x86-64 assembly.	12 years ago
Ronald S. Bultje	fef906c77c	Move vorbis_inverse_coupling from dsputil to vorbisdspcontext. Conveniently (together with Justin's earlier patches), this makes our vorbis decoder entirely independent of dsputil.	12 years ago
Mans Rullgard	d526c5338d	ARM: allow runtime masking of CPU features This allows masking CPU features with the -cpuflags avconv option which is useful for testing different optimisations without rebuilding. Signed-off-by: Mans Rullgard <mans@mansr.com>	13 years ago
Diego Biurrun	3dde147ff9	cosmetics: Consistently place static, inline and av_cold attributes/keywords.	13 years ago
Mans Rullgard	2912e87a6c	Replace FFmpeg with Libav in licence headers Signed-off-by: Mans Rullgard <mans@mansr.com>	14 years ago
Justin Ruggles	a8ae4e0e7b	Remove unneeded add bias from 3 functions. DSPContext.vector_fmul_window() DCADSPContext.lfe_fir() SynthFilterContext.synth_filter_float() Signed-off-by: Mans Rullgard <mans@mansr.com> (cherry picked from commit `80ba1ddb58`)	14 years ago
Justin Ruggles	80ba1ddb58	Remove unneeded add bias from 3 functions. DSPContext.vector_fmul_window() DCADSPContext.lfe_fir() SynthFilterContext.synth_filter_float() Signed-off-by: Mans Rullgard <mans@mansr.com>	14 years ago
Måns Rullgård	08255107cf	DCA: ARM/NEON optimised lfe_fir Originally committed as revision 22863 to svn://svn.ffmpeg.org/ffmpeg/trunk	15 years ago
Måns Rullgård	2ed6f39944	Replace many includes of libavutil/common.h with what is actually needed This reduces the number of false dependencies on header files and speeds up compilation. Originally committed as revision 22407 to svn://svn.ffmpeg.org/ffmpeg/trunk	15 years ago
Måns Rullgård	75fb5c24ed	Move FASTDIV macro to intmath.h Originally committed as revision 21335 to svn://svn.ffmpeg.org/ffmpeg/trunk	15 years ago
Måns Rullgård	544f5a922f	Optimise av_log2 with clz when available 10% faster flac decoding on x86 and ARM. Originally committed as revision 21217 to svn://svn.ffmpeg.org/ffmpeg/trunk	15 years ago
Stefano Sabatini	987903826b	Globally rename the header inclusion guard names. Consistently apply this rule: the guard name is obtained from the filename by stripping the leading "lib", converting '/' and '.' to '_' and uppercasing the resulting name. Guard names in the root directory have to be prefixed by "FFMPEG_". Originally committed as revision 15120 to svn://svn.ffmpeg.org/ffmpeg/trunk	16 years ago
Måns Rullgård	3540b950ec	add missing #include "common.h" to libavutil headers Originally committed as revision 12502 to svn://svn.ffmpeg.org/ffmpeg/trunk	17 years ago
Zuxy Meng	85074d3c93	Reapply r12489: Add pure, const and malloc attributes to proper functions in libavutil. Fix a compilation failure in r12489. Originally committed as revision 12498 to svn://svn.ffmpeg.org/ffmpeg/trunk	17 years ago
Benoit Fouet	2119bb8f51	revert r12489. Originally committed as revision 12490 to svn://svn.ffmpeg.org/ffmpeg/trunk	17 years ago
Zuxy Meng	6544f48f03	Pure, const and malloc attributes to libavutil. Patch by Zuxy Meng: zuxy meng gmail com Original thread: [FFmpeg-devel] [PATCH] Pure, const and malloc attributes to libavutil Date: 03/18/2008 6:09 AM Originally committed as revision 12489 to svn://svn.ffmpeg.org/ffmpeg/trunk	17 years ago
Diego Biurrun	5b21bdabe4	Add FFMPEG_ prefix to all multiple inclusion guards. Originally committed as revision 10765 to svn://svn.ffmpeg.org/ffmpeg/trunk	17 years ago
Måns Rullgård	99545457bf	include all prerequisites in header files Originally committed as revision 9344 to svn://svn.ffmpeg.org/ffmpeg/trunk	18 years ago
Diego Biurrun	b78e7197a8	Change license headers to say 'FFmpeg' instead of 'this program/this library' and fix GPL/LGPL version mismatches. Originally committed as revision 6577 to svn://svn.ffmpeg.org/ffmpeg/trunk	18 years ago
Diego Biurrun	04d7f60143	Add official LGPL license headers to the files that were missing them. Originally committed as revision 6219 to svn://svn.ffmpeg.org/ffmpeg/trunk	18 years ago
Måns Rullgård	b9a73d8d2f	move adler32 to libavutil Originally committed as revision 5731 to svn://svn.ffmpeg.org/ffmpeg/trunk	19 years ago

12 Commits (9613ba95c11c242d0a52a98f00495627bc1cee42)