Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Currently, any samples in the final frame are not decoded because they are
only represented by one frame instead of two. So we encode two final frames to
cover both the analysis delay and the MDCT delay.
I have no idea what the idea was behind the original code,
but the new code is equivalent to it.
In that loop that places the new node nodes[j] contains
always the data of the new node (since the steps are always
in order: FFSWAP copies node[j] to node[j-1], j is decremented).
Thus nodes[j].no == i and nodes[j].sym == HNODE.
make fate still passes and contains VP6 samples which use
FF_HUFFMAN_FLAG_HNODE_FIRST.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
While pshufb allows emulating bswap on XMM registers for SSSE3, more
shuffling is needed for SSE2. Alignment is critical, so specific codepaths
are provided for this case.
For the huffyuv sequence "angels_480-huffyuvcompress.avi":
C (using bswap instruction): ~ 55k cycles
SSE2: ~ 40k cycles
SSSE3 using unaligned loads: ~ 35k cycles
SSSE3 using aligned loads: ~ 30k cycles
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Offsets are relative to the end of the header, not the
start of the buffer, thus the buffer size needs to be subtracted.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Codec is too simple to gain much from it at lower resolutions,
but should help at very high resolutions, particularly for
v3 and v5 where a not too optimized pseudo-YUV to RGB
is done in the codec.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
This reverts e6e7bfc1 and 365e1ec2.
The code may be incorrect both before and after the revert, but we
do not have any samples that were fixed by the original commits.
Fixes ticket #871.
With gcc 4.6 this part of the code is ca. 4x faster, resulting
in an overall speedup of around 5% for fate-fraps-v5 sample.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
On x86-64, it indeed uses all 16 registers (and on x86-32, this gets
clipped to 8). Not marking it properly causes callers of this function
to fail randomly because of XMM register clobbering.
Note: This fixes the following GCC warning :-
libavcodec/sunrast.c:94: warning: comparison of unsigned expression < 0 is always false.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Codec has only I- and skip-frames, so there is no
need for reget_buffer, change it so it works with
get_buffer.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>