Martin Storsjö
7f905f3672
aarch64: Make the indentation more consistent
...
Some functions have slightly different indentation styles; try
to match the surrounding code.
libavcodec/aarch64/vc1dsp_neon.S is skipped here, as it intentionally
uses a layered indentation style to visually show how different
unrolled/interleaved phases fit together.
Signed-off-by: Martin Storsjö <martin@martin.st>
1 year ago
Martin Storsjö
184103b310
aarch64: Consistently use lowercase for vector element specifiers
...
Signed-off-by: Martin Storsjö <martin@martin.st>
1 year ago
Martin Storsjö
402784ba9f
aarch64: h264dsp: Fix incorrectly indented code
...
Signed-off-by: Martin Storsjö <martin@martin.st>
3 years ago
Mikhail Nitenko
43ca887bc2
lavc/aarch64: h264, add chroma loop filters for 10bit
...
Benchmarks: A53 A72
h264_h_loop_filter_chroma422_10bpp_c: 282.7 114.2
h264_h_loop_filter_chroma422_10bpp_neon: 109.5 78.5
h264_h_loop_filter_chroma_10bpp_c: 165.0 81.5
h264_h_loop_filter_chroma_10bpp_neon: 120.0 76.7
h264_h_loop_filter_chroma_intra422_10bpp_c: 323.7 124.2
h264_h_loop_filter_chroma_intra422_10bpp_neon: 155.0 102.7
h264_h_loop_filter_chroma_intra_10bpp_c: 121.0 49.5
h264_h_loop_filter_chroma_intra_10bpp_neon: 79.7 53.7
h264_h_loop_filter_chroma_mbaff422_10bpp_c: 188.5 75.0
h264_h_loop_filter_chroma_mbaff422_10bpp_neon: 120.0 75.5
h264_h_loop_filter_chroma_mbaff_intra422_10bpp_c: 116.7 46.0
h264_h_loop_filter_chroma_mbaff_intra422_10bpp_neon: 79.7 53.7
h264_h_loop_filter_chroma_mbaff_intra_10bpp_c: 63.0 27.2
h264_h_loop_filter_chroma_mbaff_intra_10bpp_neon: 48.5 34.0
h264_v_loop_filter_chroma_10bpp_c: 258.7 135.5
h264_v_loop_filter_chroma_10bpp_neon: 71.2 51.0
h264_v_loop_filter_chroma_intra_10bpp_c: 158.0 70.7
h264_v_loop_filter_chroma_intra_10bpp_neon: 48.7 31.5
Signed-off-by: Mikhail Nitenko <mnitenko@gmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
3 years ago
Martin Storsjö
c60b76d0c8
aarch64: h264dsp: Fix indentation of some functions to match the rest
...
Signed-off-by: Martin Storsjö <martin@martin.st>
3 years ago
Martin Storsjö
e86ec831b0
aarch64: h264dsp: Remove unnecessary sign extensions
...
These became unnecessary when the stride arguments were changed from
int to ptrdiff_t in bc26fe8927
(0576ef466d
) and
d5d699ab6e
(aa844dc46f
).
Signed-off-by: Martin Storsjö <martin@martin.st>
3 years ago
Janne Grunau
186bd30aa3
h264/arm64: implement missing 4:2:2 chroma loop filter neon functions
6 years ago
Janne Grunau
28a8b5413b
h264/aarch64: add intra loop filter neon asm
...
Add my neon asm from x264 relicensed under the LGPL 2.1 or later. Ported
(x264 uses nv12 chroma) and optimized.
Cycle count for checkasm --bench on a Snapdragon 820e:
h264_h_loop_filter_luma_intra_8bpp_c: 60.0
h264_h_loop_filter_luma_intra_8bpp_neon: 54.2
h264_v_loop_filter_luma_intra_8bpp_c: 148.3
h264_v_loop_filter_luma_intra_8bpp_neon: 73.8
h264_h_loop_filter_chroma_intra_8bpp_c: 27.8
h264_h_loop_filter_chroma_intra_8bpp_neon: 21.4
h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 15.8
h264_h_loop_filter_chroma_mbaff_intra_8bpp_neon: 15.7
h264_v_loop_filter_chroma_intra_8bpp_c: 45.8
h264_v_loop_filter_chroma_intra_8bpp_neon: 17.3
6 years ago
Janne Grunau
846c3d6aca
h264/aarch64: optimize neon loop filter
...
Exit as soon as possible if no filtering will be done.
Improves the checkasm --bench cycle count on a Snapdragon 820e:
h264_h_loop_filter_luma_8bpp_c: 72.4 -> 72.5
h264_h_loop_filter_luma_8bpp_neon: 97.1 -> 56.3
h264_v_loop_filter_luma_8bpp_c: 174.0 -> 173.5
h264_v_loop_filter_luma_8bpp_neon: 62.9 -> 60.9
h264_h_loop_filter_chroma_8bpp_c: 30.2 -> 30.3
h264_h_loop_filter_chroma_8bpp_neon: 51.6 -> 25.7
h264_v_loop_filter_chroma_8bpp_c: 57.3 -> 57.3
h264_v_loop_filter_chroma_8bpp_neon: 28.0 -> 24.0
6 years ago
Janne Grunau
bb515e3a73
h264/aarch64: sign extend int stride in loop filter asm
6 years ago
Janne Grunau
f896bca03f
aarch64: h264 (bi)weight NEON optimizations
...
Ported from ARMv7 NEON.
11 years ago
Janne Grunau
36e3b1f2fd
aarch64: h264 loop filter NEON optimizations
...
Ported from ARMv7 NEON.
11 years ago