This adds a hand-optimized assembly version for get_cabac much like the
existing one, but it works if the table offsets are RIP-relative.
Compared to the non-RIP-relative version this adds 2 lea instructions
and it needs one extra register.
There is a surprisingly large performance improvement over the c version (more
so than the generated assembly seems to suggest) just in get_cabac, I measured
roughly 40% faster for get_cabac on a K8. However, overall the difference is
not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
Hopefully it still compiles on x86 32bit...
Now that only one table is used, there's some chance even darwin as compiles
this (apparently the label arithmetic used previously doesn't work if it
involves symbols defined in a different file, thanks to Ronald S. Bultje for
helping me with this).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The reason is this is easier for PIC code (in particular on darwin...).
Keep the old names as pointers (static in cabac_functions.h so gcc
knows these are just immediate offsets) so the c code can nicely stay the same
(alternatively could use offsets directly in the functions needing the
tables). This should produce the same code as before with non-pic and better
code (confirmed) with pic.
The assembly uses the new table but still won't work for PIC case.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
not used outside the cabac test functions (which probably means it's
a bad test if it doesn't use the same tables as the real functions?)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This adds a hand-optimized assembly version for get_cabac much like the
existing one, but it works if the table offsets are RIP-relative.
Compared to the non-RIP-relative version this adds 2 lea instructions
and it needs one extra register. get_cabac() gets about 40% faster, for
an overall speedup of about 5%.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The reason is this is easier for PIC code (in particular on darwin...).
Keep the old names as pointers (static in cabac_functions.h so gcc
knows these are just immediate offsets) so the c code can nicely stay the same
(alternatively could use offsets directly in the functions needing the
tables). This should produce the same code as before with non-pic and better
code (confirmed) with pic.
The assembly uses the new table but still won't work for PIC case.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The assembler may fail to place literal pools close enough to
instructions referencing them. An explicit .ltorg directive
fixes this.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This simplifies handling by removing a special case.
Its also needed to make the next change possible.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Note, 1.3 is not finalized and the bitstream will still change
do not use it yet. This option is just to make playing with it
easier, otherwise one would have to edit the source
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
quant_mats valid range depends on the block size.
This fixes a global array overread.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The new lowres support is limited to decoders where lowres decoding
is possible in high quality.
I was not able to measure any speed difference, but if one is found
the 2-3 lines that might affect speed can be made compile time conditional
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This ended up corrupting data structures and may possibly
lead to a double free.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>