The blockdsp split did not cover Alpha optimizations
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Intel i263 codec has special 8-byte dummy frames that should not be decoded,
so do not even attempt to decode them and skip them instead.
Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
Also replace INLINE_<opt> with EXTERNAL_<opt> that were wrongly
changed by commit 2b05db4f81
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Normally, a Laplace distribution is more typical of the residuals
encoded, but for noisy input, it's both better and simpler to be
safe and use a 1/d^2 distribution. Second hunk could use some
renormalization but it has effectively little impact.
Output size of ffvhuff on various 4:2:0 sequences:
context=0,1/d: 851974 27226 1137281
context=0,1/d²: 619081 25069 1051500
context=0,1/d³: 501983 30454 1290561
context=0,lapl: 500650 31754 1304082
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This avoids the following libass warning when using the subtitles
filter: "Neither PlayResX nor PlayResY defined. Assuming 384x288"
Subtitles tests change because the output is ASS and the PlayRes[XY]
ends up in the output.
POSIX gurantees >=32bit so it all fits in signed int
Also >=32bit ints are assumed througout the codebase
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The reader reads in chunks of 11 bits at most, and at most 3 times. The unsafe
reader therefore may read 6 chunks instead of 1 in worst case, ie 8 bytes,
which is within the padding tolerance.
The reader ends up being ~10% faster. Cumulative effect of unsafe reading and
code block swapping on 3 sequences is for 1 thread, decoding time goes from
23.3s to 19.0s.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The old code was reserving the 0xFFFF entry to represent an inexisting
entry/codeword. These entries are now detected through their length
being <= 0. As this entry is often used for the residuals (-1,-1), which
should be among the most frequent, it is particularly important to not
reserve it.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The effect is not really deterministic, as it seems to be a combination
on x86_64 of fewer registers used, different jump offsets and, for all
archs, of likely branches.
Speedup is around 15%.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Those macros take a byte number as shift argument, as this argument
differs between MMX and SSE2 instructions.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>