vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
The attribution was removed by libav while moving the code to libavutil
The original code is from
commit eb4825b5d4
Author: Loren Merritt <lorenm@u.washington.edu>
Date: Thu Aug 10 19:06:25 2006 +0000
sse and 3dnow implementations of float->int conversion and mdct windowing.
15% faster vorbis.
and
commit 069720565c
Author: Loren Merritt <lorenm@u.washington.edu>
Date: Fri Aug 11 18:19:37 2006 +0000
vorbis simd tweaks
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>