Unlike x86, fmin/fmax are single instructions, not function calls. They are much much faster than doing a comparison, then branching based on its results. With this, audiodsp.vector_clipf gets almost twice as fast, and a properly unrollled version of it gets 4-5x faster, on SiFive-U74. This is only the low-hanging fruit: FFMIN and FFMAX are presumably affected as well. This likely applies to other instruction sets with native IEEE floats, especially those lacking a conditional select instruction.release/7.1
parent
b0b3bea10b
commit
39ced529b0
1 changed files with 19 additions and 0 deletions
Loading…
Reference in new issue