xufuji456
bd2f00f665
codec/aarch64/hevc: add transform_luma_neon
...
got 56% speed up (run_count=1000, CPU=Cortex A53)
transform_4x4_luma_neon: 45 transform_4x4_luma_c: 103
Signed-off-by: xufuji456 <839789740@qq.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
xufuji456
00a062b8d5
codec/aarch64/hevc:add idct_32x32_neon
...
got 73% speed up (run_count=1000, CPU=Cortex A53)
idct_32x32_neon: 4826 idct_32x32_c: 18236
idct_32x32_neon: 4824 idct_32x32_c: 18149
idct_32x32_neon: 4937 idct_32x32_c: 18333
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
J. Dekker
37cde570bc
lavc/aarch64: add clip N macro
...
Signed-off-by: J. Dekker <jdek@itanimul.li>
2 years ago
xufuji456
4b4de07721
libavcodec/hevc: add hevc idct4x4 neon of aarch64
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
Martin Storsjö
ec7fa13eb0
aarch64: hevcdsp_idct: Reuse preexisting macros for transposes
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2 years ago
J. Dekker
ce2f47318b
lavc/aarch64: hevc_add_res add 12bit variants
...
hevc_add_res_4x4_12_c: 46.0
hevc_add_res_4x4_12_neon: 18.7
hevc_add_res_8x8_12_c: 194.7
hevc_add_res_8x8_12_neon: 25.2
hevc_add_res_16x16_12_c: 716.0
hevc_add_res_16x16_12_neon: 69.7
hevc_add_res_32x32_12_c: 3820.7
hevc_add_res_32x32_12_neon: 261.0
Signed-off-by: J. Dekker <jdek@itanimul.li>
2 years ago
J. Dekker
aa9eabb7a5
lavc/aarch64: reformat add_res funcs
...
Signed-off-by: J. Dekker <jdek@itanimul.li>
2 years ago
Martin Storsjö
f27e3ccf06
aarch64: hevc_idct: Fix overflows in idct_dc
...
This is marginally slower, but correct for all input values.
The previous implementation failed with certain input seeds, e.g.
"checkasm --test=hevc_idct 98".
Signed-off-by: Martin Storsjö <martin@martin.st>
4 years ago
Josh Dekker
75c2ddfa61
lavc/aarch64: add HEVC idct_dc NEON
...
Signed-off-by: Josh Dekker <josh@itanimul.li>
4 years ago
Reimar Döffinger
00c916ef61
lavc/aarch64: port HEVC add_residual NEON
...
Speedup is fairly small, around 1.5%, but these are fairly simple.
Signed-off-by: Josh Dekker <josh@itanimul.li>
4 years ago
Reimar Döffinger
30f80d855b
lavc/aarch64: port HEVC SIMD idct NEON
...
Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
available on aarch64.
For a UHD HDR (10 bit) sample video these were consuming the most time
and this optimization reduced overall decode time from 19.4s to 16.4s,
approximately 15% speedup.
Test sample was the first 300 frames of "LG 4K HDR Demo - New York.ts",
running on Apple M1.
Signed-off-by: Josh Dekker <josh@itanimul.li>
4 years ago