This adds a hand-optimized assembly version for get_cabac much like the
existing one, but it works if the table offsets are RIP-relative.
Compared to the non-RIP-relative version this adds 2 lea instructions
and it needs one extra register. get_cabac() gets about 40% faster, for
an overall speedup of about 5%.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The reason is this is easier for PIC code (in particular on darwin...).
Keep the old names as pointers (static in cabac_functions.h so gcc
knows these are just immediate offsets) so the c code can nicely stay the same
(alternatively could use offsets directly in the functions needing the
tables). This should produce the same code as before with non-pic and better
code (confirmed) with pic.
The assembly uses the new table but still won't work for PIC case.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The assembler may fail to place literal pools close enough to
instructions referencing them. An explicit .ltorg directive
fixes this.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This allows masking CPU features with the -cpuflags avconv option
which is useful for testing different optimisations without rebuilding.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This removes all references to AVCodecContext.dsp_mask and marks
it for eviction at the next version bump. It has been superseded
by av_set_cpu_flag_mask() which, unlike this field, works everywhere.
Signed-off-by: Mans Rullgard <mans@mansr.com>
General cosmetics, such as keeping lines under 80 characters,
fixing a couple of typos (predition -> prediction) and a
general style fix that was pointed out by Derek when I was having
my sliced multithreading patch in review by him.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Do not pointlessly call ff_alloc_packet multiple times,
and fix an infinite loop by clamping the maximum
number of bits to target in the algorithm that does
not use lambda.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Save the old output configuration (if it has been used
successfully) when trying a new configuration. If the new configuration
fails to decode, restore the last successful configuration.
There is no point in storing the value in a variable, since it is not
used anywhere else in the decoder.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
This reworks a loop to get rid of an ugly pointer cast,
fixing errors seen with the PathScale ENZO compiler.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Recent register allocation changes (x86inc.asm update) changed the
register order and thus opcodes for the inner loops. One of them became
>128bytes, which confuses other parts of this function where it jumps
to fixed-offset positions to extend the edge by fixed amounts. A simple
register change fixes this.