FFmpeg

Commit Graph

Author	SHA1	Message	Date
Martin Storsjö	7f905f3672	aarch64: Make the indentation more consistent Some functions have slightly different indentation styles; try to match the surrounding code. libavcodec/aarch64/vc1dsp_neon.S is skipped here, as it intentionally uses a layered indentation style to visually show how different unrolled/interleaved phases fit together. Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Casey Smalley	b98ee1a355	aarch64/hevc: Replace br return with ret This patch changes the return instruction in the tr_32x4 macro from BR to RET. Function returns should always use the RET instruction instead of BR, to avoid interfering with branch prediction. On devices that support BTI, this is observeable as a landing pad is required when branching with BR. The change fixes fate-hevc-hdr-vivid-metadata when on hardware with BTI support. Signed-off-by: Casey Smalley <casey.smalley@arm.com> Signed-off-by: Martin Storsjö <martin@martin.st>	1 year ago
Reimar Döffinger	dcff15692d	hevcdsp_idct_neon.S: Avoid unnecessary mov. ret can be given an argument instead. This is also consistent with how other assembler code in FFmpeg does it. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	1 year ago
xufuji456	bd2f00f665	codec/aarch64/hevc: add transform_luma_neon got 56% speed up (run_count=1000, CPU=Cortex A53) transform_4x4_luma_neon: 45 transform_4x4_luma_c: 103 Signed-off-by: xufuji456 <839789740@qq.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
xufuji456	00a062b8d5	codec/aarch64/hevc:add idct_32x32_neon got 73% speed up (run_count=1000, CPU=Cortex A53) idct_32x32_neon: 4826 idct_32x32_c: 18236 idct_32x32_neon: 4824 idct_32x32_c: 18149 idct_32x32_neon: 4937 idct_32x32_c: 18333 Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
J. Dekker	37cde570bc	lavc/aarch64: add clip N macro Signed-off-by: J. Dekker <jdek@itanimul.li>	2 years ago
xufuji456	4b4de07721	libavcodec/hevc: add hevc idct4x4 neon of aarch64 Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
Martin Storsjö	ec7fa13eb0	aarch64: hevcdsp_idct: Reuse preexisting macros for transposes Signed-off-by: Martin Storsjö <martin@martin.st>	2 years ago
J. Dekker	ce2f47318b	lavc/aarch64: hevc_add_res add 12bit variants hevc_add_res_4x4_12_c: 46.0 hevc_add_res_4x4_12_neon: 18.7 hevc_add_res_8x8_12_c: 194.7 hevc_add_res_8x8_12_neon: 25.2 hevc_add_res_16x16_12_c: 716.0 hevc_add_res_16x16_12_neon: 69.7 hevc_add_res_32x32_12_c: 3820.7 hevc_add_res_32x32_12_neon: 261.0 Signed-off-by: J. Dekker <jdek@itanimul.li>	2 years ago
J. Dekker	aa9eabb7a5	lavc/aarch64: reformat add_res funcs Signed-off-by: J. Dekker <jdek@itanimul.li>	2 years ago
Martin Storsjö	f27e3ccf06	aarch64: hevc_idct: Fix overflows in idct_dc This is marginally slower, but correct for all input values. The previous implementation failed with certain input seeds, e.g. "checkasm --test=hevc_idct 98". Signed-off-by: Martin Storsjö <martin@martin.st>	4 years ago
Josh Dekker	75c2ddfa61	lavc/aarch64: add HEVC idct_dc NEON Signed-off-by: Josh Dekker <josh@itanimul.li>	4 years ago
Reimar Döffinger	00c916ef61	lavc/aarch64: port HEVC add_residual NEON Speedup is fairly small, around 1.5%, but these are fairly simple. Signed-off-by: Josh Dekker <josh@itanimul.li>	4 years ago
Reimar Döffinger	30f80d855b	lavc/aarch64: port HEVC SIMD idct NEON Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth available on aarch64. For a UHD HDR (10 bit) sample video these were consuming the most time and this optimization reduced overall decode time from 19.4s to 16.4s, approximately 15% speedup. Test sample was the first 300 frames of "LG 4K HDR Demo - New York.ts", running on Apple M1. Signed-off-by: Josh Dekker <josh@itanimul.li>	4 years ago

14 Commits (0e5f71230a1668cfdbc6e5d9d2b3bbee613cdfcc)