This way, the special IDCT permutations are no longer needed. Bfin code
is disabled until someone updates it. This is similar to how H264 does
it, and removes the dsputil dependency imposed by the scantable code.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The YUV channels of VP6 are encoded in a highly linear fashion which does
not have any slice-like concept to thread. The alpha channel of VP6A is
fairly independent of the YUV and comprises 40% of the work. This patch
uses the THREAD_SLICE capability to split the YUV and A decodes into
separate threads.
Two bugs are fixed by splitting YUV and alpha state:
- qscale_table from VP6A decode was for alpha channel instead of YUV
- alpha channel filtering settings were overwritten by YUV header parse
Signed-off-by: Ben Jackson <ben@ben.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Makes golden_frame more like other frame data, paves way for threading
alpha channel decode.
Signed-off-by: Ben Jackson <ben@ben.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Instead, use it on the first member, since by definition, if
any member is aligned, the whole struct must be, in order to
maintain that alignment.
Fixes compilation with some finicky compilers, like a mix of libclang/msvc
Idea for fix from Måns Rullgård.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Instead, use it on the first member, since by definition, if
any member is aligned, the whole struct must be, in order to
maintain that alignment.
Fixes compilation with some finicky compilers.
Idea for fix from Måns Rullgård.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
This moves all VP3-specific function pointers from dsputil to a
new vp3dsp context. There is no reason to ever use the VP3 IDCT
where an MPEG2 IDCT is expected or vice versa.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Grab from the bitstream in 16-bit chunks instead of 8-bit chunks.
TODO: grab in 32-bit chunks on 64-bit systems.
Originally committed as revision 24783 to svn://svn.ffmpeg.org/ffmpeg/trunk
Create a custom table for VP5/6/8's renorm to avoid depending on H.264's.
Saves one instruction in the arithmetic decoder as well.
Originally committed as revision 24701 to svn://svn.ffmpeg.org/ffmpeg/trunk
Always inline the arithmetic coder, except in the case of header-parsing stuff,
in which case don't inline it at all to save code size.
Originally committed as revision 24677 to svn://svn.ffmpeg.org/ffmpeg/trunk
This is a lot more reliable to get cmov rather than trying to trick gcc into
generating it, useful since it's 2% faster overall.
Patch by Eli Friedman <eli.friedman at gmail>
Originally committed as revision 24471 to svn://svn.ffmpeg.org/ffmpeg/trunk
on the huffman tree, instead of traversing the tree in a while loop.
Based on the similar optimization in libvpx's detokenize.c
10% faster at normal bitrates, and 30% faster for high-bitrate intra-only
Originally committed as revision 24468 to svn://svn.ffmpeg.org/ffmpeg/trunk
No difference at the moment, but allows a future branchy variant
of vp56_rac_get_prob to be significantly faster
Originally committed as revision 24467 to svn://svn.ffmpeg.org/ffmpeg/trunk
Saves nothing except a bit of memory/cache now, but will allow future
optimizations.
Originally committed as revision 24411 to svn://svn.ffmpeg.org/ffmpeg/trunk
Necessary because of this GCC bug:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44474
To do this, convert some, but not all (!) of the variables in VP56RangeCoder
into local variables.
If we convert c->high into a local variable, gcc gets the stupids and refuses
to use a conditional move for the unpredictable main branch.
TODO: dispense with this bullshit and write an asm version.
Originally committed as revision 23924 to svn://svn.ffmpeg.org/ffmpeg/trunk
This incantation causes gcc 4.3 to generate cmov on x86, a vastly better option
than a completely unpredictable branch.
Hopefully this carries over to newer versions and other CPUs with conditionals.
~5 cycles saved per call on a Core i7.
Originally committed as revision 23921 to svn://svn.ffmpeg.org/ffmpeg/trunk
Using macro templates allows the vp[56]_adjust functions to be
inlined instead of called through function pointers. The new
function pointers enable optimised implementations of the filters.
4% faster VP6 decoding on Cortex-A8.
Originally committed as revision 22992 to svn://svn.ffmpeg.org/ffmpeg/trunk
Passing an explicit filename to this command is only necessary if the
documentation in the @file block refers to a file different from the
one the block resides in.
Originally committed as revision 22921 to svn://svn.ffmpeg.org/ffmpeg/trunk
These macros are redundant. All uses are replaced with the generic
DECLARE_ALIGNED macro instead.
Originally committed as revision 22233 to svn://svn.ffmpeg.org/ffmpeg/trunk