mirror of https://github.com/FFmpeg/FFmpeg.git
Tag:
Branch:
Tree:
d7abb7d143
master
oldabi
release/0.10
release/0.11
release/0.5
release/0.6
release/0.7
release/0.8
release/0.9
release/1.0
release/1.1
release/1.2
release/2.0
release/2.1
release/2.2
release/2.3
release/2.4
release/2.5
release/2.6
release/2.7
release/2.8
release/3.0
release/3.1
release/3.2
release/3.3
release/3.4
release/4.0
release/4.1
release/4.2
release/4.3
release/4.4
release/5.0
release/5.1
release/6.0
release/6.1
release/7.0
release/7.1
N
ffmpeg-0.6.3
n0.10
n0.10.1
n0.10.10
n0.10.11
n0.10.12
n0.10.13
n0.10.14
n0.10.15
n0.10.16
n0.10.2
n0.10.3
n0.10.4
n0.10.5
n0.10.6
n0.10.7
n0.10.8
n0.10.9
n0.11
n0.11-dev
n0.11.1
n0.11.2
n0.11.3
n0.11.4
n0.11.5
n0.12-dev
n0.5.10
n0.5.11
n0.5.12
n0.5.13
n0.5.14
n0.5.15
n0.5.5
n0.5.6
n0.5.7
n0.5.8
n0.5.9
n0.6.4
n0.6.5
n0.6.6
n0.6.7
n0.7.1
n0.7.10
n0.7.11
n0.7.12
n0.7.13
n0.7.14
n0.7.15
n0.7.16
n0.7.17
n0.7.2
n0.7.3
n0.7.4
n0.7.5
n0.7.6
n0.7.7
n0.7.8
n0.7.9
n0.8
n0.8.1
n0.8.10
n0.8.11
n0.8.12
n0.8.13
n0.8.14
n0.8.15
n0.8.2
n0.8.3
n0.8.4
n0.8.5
n0.8.6
n0.8.7
n0.8.8
n0.8.9
n0.9
n0.9.1
n0.9.2
n0.9.3
n0.9.4
n1.0
n1.0.1
n1.0.10
n1.0.2
n1.0.3
n1.0.4
n1.0.5
n1.0.6
n1.0.7
n1.0.8
n1.0.9
n1.1
n1.1-dev
n1.1.1
n1.1.10
n1.1.11
n1.1.12
n1.1.13
n1.1.14
n1.1.15
n1.1.16
n1.1.2
n1.1.3
n1.1.4
n1.1.5
n1.1.6
n1.1.7
n1.1.8
n1.1.9
n1.2
n1.2-dev
n1.2.1
n1.2.10
n1.2.11
n1.2.12
n1.2.2
n1.2.3
n1.2.4
n1.2.5
n1.2.6
n1.2.7
n1.2.8
n1.2.9
n1.3-dev
n2.0
n2.0.1
n2.0.2
n2.0.3
n2.0.4
n2.0.5
n2.0.6
n2.0.7
n2.1
n2.1-dev
n2.1.1
n2.1.2
n2.1.3
n2.1.4
n2.1.5
n2.1.6
n2.1.7
n2.1.8
n2.2
n2.2-dev
n2.2-rc1
n2.2-rc2
n2.2.1
n2.2.10
n2.2.11
n2.2.12
n2.2.13
n2.2.14
n2.2.15
n2.2.16
n2.2.2
n2.2.3
n2.2.4
n2.2.5
n2.2.6
n2.2.7
n2.2.8
n2.2.9
n2.3
n2.3-dev
n2.3.1
n2.3.2
n2.3.3
n2.3.4
n2.3.5
n2.3.6
n2.4
n2.4-dev
n2.4.1
n2.4.10
n2.4.11
n2.4.12
n2.4.13
n2.4.14
n2.4.2
n2.4.3
n2.4.4
n2.4.5
n2.4.6
n2.4.7
n2.4.8
n2.4.9
n2.5
n2.5-dev
n2.5.1
n2.5.10
n2.5.11
n2.5.2
n2.5.3
n2.5.4
n2.5.5
n2.5.6
n2.5.7
n2.5.8
n2.5.9
n2.6
n2.6-dev
n2.6.1
n2.6.2
n2.6.3
n2.6.4
n2.6.5
n2.6.6
n2.6.7
n2.6.8
n2.6.9
n2.7
n2.7-dev
n2.7.1
n2.7.2
n2.7.3
n2.7.4
n2.7.5
n2.7.6
n2.7.7
n2.8
n2.8-dev
n2.8.1
n2.8.10
n2.8.11
n2.8.12
n2.8.13
n2.8.14
n2.8.15
n2.8.16
n2.8.17
n2.8.18
n2.8.19
n2.8.2
n2.8.20
n2.8.21
n2.8.22
n2.8.3
n2.8.4
n2.8.5
n2.8.6
n2.8.7
n2.8.8
n2.8.9
n2.9-dev
n3.0
n3.0.1
n3.0.10
n3.0.11
n3.0.12
n3.0.2
n3.0.3
n3.0.4
n3.0.5
n3.0.6
n3.0.7
n3.0.8
n3.0.9
n3.1
n3.1-dev
n3.1.1
n3.1.10
n3.1.11
n3.1.2
n3.1.3
n3.1.4
n3.1.5
n3.1.6
n3.1.7
n3.1.8
n3.1.9
n3.2
n3.2-dev
n3.2.1
n3.2.10
n3.2.11
n3.2.12
n3.2.13
n3.2.14
n3.2.15
n3.2.16
n3.2.17
n3.2.18
n3.2.19
n3.2.2
n3.2.3
n3.2.4
n3.2.5
n3.2.6
n3.2.7
n3.2.8
n3.2.9
n3.3
n3.3-dev
n3.3.1
n3.3.2
n3.3.3
n3.3.4
n3.3.5
n3.3.6
n3.3.7
n3.3.8
n3.3.9
n3.4
n3.4-dev
n3.4.1
n3.4.10
n3.4.11
n3.4.12
n3.4.13
n3.4.2
n3.4.3
n3.4.4
n3.4.5
n3.4.6
n3.4.7
n3.4.8
n3.4.9
n3.5-dev
n4.0
n4.0.1
n4.0.2
n4.0.3
n4.0.4
n4.0.5
n4.0.6
n4.1
n4.1-dev
n4.1.1
n4.1.10
n4.1.11
n4.1.2
n4.1.3
n4.1.4
n4.1.5
n4.1.6
n4.1.7
n4.1.8
n4.1.9
n4.2
n4.2-dev
n4.2.1
n4.2.10
n4.2.2
n4.2.3
n4.2.4
n4.2.5
n4.2.6
n4.2.7
n4.2.8
n4.2.9
n4.3
n4.3-dev
n4.3.1
n4.3.2
n4.3.3
n4.3.4
n4.3.5
n4.3.6
n4.3.7
n4.3.8
n4.4
n4.4-dev
n4.4.1
n4.4.2
n4.4.3
n4.4.4
n4.4.5
n4.5-dev
n5.0
n5.0.1
n5.0.2
n5.0.3
n5.1
n5.1-dev
n5.1.1
n5.1.2
n5.1.3
n5.1.4
n5.1.5
n5.1.6
n5.2-dev
n6.0
n6.0.1
n6.1
n6.1-dev
n6.1.1
n6.1.2
n6.2-dev
n7.0
n7.0.1
n7.0.2
n7.1
n7.1-dev
n7.2-dev
v0.5
v0.5.1
v0.5.2
v0.5.3
v0.6
v0.6.1
${ noResults }
8 Commits (d7abb7d143fd1fbacb0084a8936bc4029afe5111)
Author | SHA1 | Message | Date |
---|---|---|---|
Hubert Mazur | d7abb7d143 |
lavc/aarch64: Add neon implementation for sse4
Provide neon implementation for sse4 function. Performance comparison tests are shown below. - sse_2_c: 80.7 - sse_2_neon: 31.0 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st> |
2 years ago |
Hubert Mazur | ad251fd262 |
lavc/aarch64: Add neon implementation for sse16
Provide neon implementation for sse16 function. Performance comparison tests are shown below. - sse_0_c: 268.2 - sse_0_neon: 43.5 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st> |
2 years ago |
Martin Storsjö | 4136405c86 |
aarch64: me_cmp: Don't do uaddlv once per iteration
The max height is currently documented as 16; the max difference per pixel is 255, and a .8h element can easily contain 16*255, thus keep accumulating in two .8h vectors, and just do the final accumulationat the end. This should work for heights up to 256. This requires a minor register renumbering in ff_pix_abs16_xy2_neon. Before: Cortex A53 A72 A73 Graviton 3 pix_abs_0_0_neon: 97.7 47.0 37.5 22.7 pix_abs_0_1_neon: 154.0 59.0 52.0 25.0 pix_abs_0_3_neon: 179.7 96.7 87.5 41.2 After: pix_abs_0_0_neon: 96.0 39.2 31.2 22.0 pix_abs_0_1_neon: 150.7 59.7 46.2 23.7 pix_abs_0_3_neon: 175.7 83.7 81.7 38.2 Signed-off-by: Martin Storsjö <martin@martin.st> |
2 years ago |
Martin Storsjö | 68a03f6424 |
aarch64: me_cmp: Switch from uabd to uabal in ff_pix_abs16_xy2_neon
Using absolute-difference-accumulate does use twice the amount of absolute-difference instructions, but avoids the need for the uaddl and add instructions, reducing the total number of instructions by 3. These can be interleaved in the rest of the calculation, to avoid tight dependencies at the end. Unfortunately, this is marginally slower on Cortex A53, but faster on A72 and A73. Before: Cortex A53 A72 A73 Graviton 3 pix_abs_0_3_neon: 175.7 109.2 92.0 41.2 After: pix_abs_0_3_neon: 179.7 96.7 87.5 41.2 Signed-off-by: Martin Storsjö <martin@martin.st> |
2 years ago |
Martin Storsjö | b46de9aba4 |
aarch64: me_cmp: Interleave some of the loads in ff_pix_abs16_xy2_neon
Before: Cortex A53 A72 A73 Graviton 3 pix_abs_0_3_neon: 183.7 112.7 97.5 41.2 After: pix_abs_0_3_neon: 175.7 109.2 92.0 41.2 Signed-off-by: Martin Storsjö <martin@martin.st> |
2 years ago |
Martin Storsjö | 02e7853fd9 |
libavcodec: aarch64: Don't clobber v8 in the h%4 case in ff_pix_abs16_xy2_neon
Checkasm doesn't currently test this codepath. Signed-off-by: Martin Storsjö <martin@martin.st> |
2 years ago |
Hubert Mazur | 01e190dc99 |
lavc/aarch64: Add pix_abs16_x2 neon implementation
Provide neon implementation for pix_abs16_x2 function. Performance tests of implementation are below. - pix_abs_0_1_c: 283.5 - pix_abs_0_1_neon: 39.0 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st> |
2 years ago |
Swinney, Jonathan | c471cc7474 |
lavc/aarch64: motion estimation functions in neon
- ff_pix_abs16_neon - ff_pix_abs16_xy2_neon In direct micro benchmarks of these ff functions verses their C implementations, these functions performed as follows on AWS Graviton 3. ff_pix_abs16_neon: pix_abs_0_0_c: 141.1 pix_abs_0_0_neon: 19.6 ff_pix_abs16_xy2_neon: pix_abs_0_3_c: 269.1 pix_abs_0_3_neon: 39.3 Tested with: ./tests/checkasm/checkasm --test=motion --bench --disable-linux-perf Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st> |
2 years ago |