When the joint table does not contain a valid entry, the decoding restarts
from scratch. By implementing the trick of jumping to the 2nd level of the
individual table (and inlining the whole), a speed improvement of 5-10%
is possible.
On a 1000-frames YUV4:2:0 video, before:
362851 decicycles in 422, 262094 runs, 50 skips
182488 decicycles in gray, 262087 runs, 57 skips
Object size: 23584
Overall time: 8.377
After:
346800 decicycles in 422, 262079 runs, 65 skips
168197 decicycles in gray, 262077 runs, 67 skips
Object size: 23188
Overall time: 7.878
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
It was instead using the highest available asm functions, which completely
kills the point of being a reference C context.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Move the GNU as check before the arch specific asm checks since the .dn
check requires gas compatible assembler.
Disable the VC-1 motion compensation NEON asm which is the only part
using that directive. The integrated assembler in the upcoming clang 3.5
does not support .dn/.qn without plans to change that. Too much effort
to implement it while it is rarely used.
http://llvm.org/bugs/show_bug.cgi?id=18199.
Blackfin is a painful platform to work with, no test machines are available
and the range of multimedia applications is dubious. Thus it only represents
a maintenance burden.
It is obviously nonsense since it produces wrong results
or even crashes (crashes should be encode-only though).
Fixes trac issue #1092.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Some encoders (e.g. flac) need to send side data when there is no more
data to be output. This enables them to output a packet with no data in
it, only side data.
I do not know on which side to place the padding to encode with 16x16 MBs
If someone knows or has a known to be correct sample, please contact me
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The check is meant for k8 CPUs. sad16_sse2 is ~20% faster than sad16_mmxext on k10.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
the mpeg encoder would try to use them if vsad or vsse were selected for frame_skip_cmp
and frame_skip_threshold or frame_skip_factor were set to values != 0
example: "ffmpeg -i INPUT -c:v mpeg2video -skipcmp vsad -skip_threshold 1 -f null -"
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>