mirror of https://github.com/FFmpeg/FFmpeg.git
The previous version was a pretty exact translation of the arm version. This version does do some unnecessary arithemetic (it does more operations on vectors that are only half filled; it does 4 uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead of packing data together (which could be done for free in the arm version). This gives a decent speedup on Cortex A53, a minor speedup on A72 and a very minor slowdown on Cortex A73. Before: Cortex A53 A72 A73 vp8_idct_add_neon: 79.7 67.5 65.0 After: vp8_idct_add_neon: 67.7 64.8 66.7 Signed-off-by: Martin Storsjö <martin@martin.st>pull/310/head
parent
49f9c4272c
commit
7e42d5f0ab
1 changed files with 25 additions and 24 deletions
Loading…
Reference in new issue