It has already been demonstrated that the de Bruijn method has benefits
over the current implementation: commit 971d12b7f9.
That commit implemented it for long long, this extends it to the int version.
Tested with FATE.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
changing the context state and restoring it is not safe if another
thread writes data into the fifo
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This reduces the memory & cache need from 256 to 64 bytes
the code also seems faster with this change
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This uses Stein's binary GCD algorithm:
https://en.wikipedia.org/wiki/Binary_GCD_algorithm
to get a roughly 4x speedup over Euclidean GCD on standard architectures
with a compiler intrinsic for ctzll, and a roughly 2x speedup otherwise.
At the moment, the compiler intrinsic is used on GCC and Clang due to
its easy availability.
Quick note regarding overflow: yes, subtractions on int64_t can, but the
llabs takes care of that. The llabs is also guaranteed to be safe, with
no annoying INT64_MIN business since INT64_MIN being a power of 2, is
shifted down before being sent to llabs.
The binary GCD needs ff_ctzll, an extension of ff_ctz for long long (int64_t). On
GCC, this is provided by a built-in. On Microsoft, there is a
BitScanForward64 analog of BitScanForward that should work; but I can't confirm.
Apparently it is not available on 32 bit builds; so this may or may not
work correctly. On Intel, per the documentation there is only an
intrinsic for _bit_scan_forward and people have posted on forums
regarding _bit_scan_forward64, but often their documentation is
woeful. Again, I don't have it, so I can't test.
As such, to be safe, for now only the GCC/Clang intrinsic is added, the rest
use a compiled version based on the De-Bruijn method of Leiserson et al:
http://supertech.csail.mit.edu/papers/debruijn.pdf.
Tested with FATE, sample benchmark (x86-64, GCC 5.2.0, Haswell)
with a START_TIMER and STOP_TIMER in libavutil/rationsl.c, followed by a
make fate.
aac-am00_88.err:
builtin:
714 decicycles in av_gcd, 4095 runs, 1 skips
de-bruijn:
1440 decicycles in av_gcd, 4096 runs, 0 skips
previous:
2889 decicycles in av_gcd, 4096 runs, 0 skips
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
GCC 3.4 introduced an attribute warn_unused_result to warn when a programmer
discards the return value. Applying this judiciously across the codebase can help
in fixing a lot of problems. At a high level, functions which return error codes
should always be checked. More concretely, consider the functions ff_add_format
and the like in avfilter/formats.h. A quick examination shows that a large portion
of libavfilter fails to handle the associated errors, usually AVERROR(ENOMEM).
The above example was where I observed the utility of this, but it should be
useful in many places across the code base.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
They're short enough that inlining them actually reduces code size due to
all the overhead associated with making a function call.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
These field are difficult to interpret, and are provided by a single
encoder (mpegvideoenc). In general they do not belong to a structure
containing raw data only, so remove them from AVFrame.
Mpegvideoenc now uses a private field in Picture for its internal
computations.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
MIPS R6 supports unaligned memory access and does not have
the load/store-left/right family of instructions.
Signed-off-by: Vicente Olivert Riera <Vincent.Riera at imgtec.com>
Signed-off-by: Luca Barbato <lu_zero at gentoo.org>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Most functions are valid over a domain and range of [0.0-1.0] but
some are defined over greater. This patch does not deal with
AVColorRange and assumes AVCOL_RANGE_JPEG for the returned values.
Signed-off-by: Kevin Wheatley <kevin.j.wheatley@gmail.com>
Signed-off-by: Kevin Wheatley <kevin.j.wheatley@gmail.com>
Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Ensure that the components are ordered consistently, ie. always
RGB(A) and YUV(A). This allows to identify a specific plane on a given
pixel format without hard-coding knowledge of the plane order.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
There is no practical benefit in having this structure elements
bit packed given the size of the structure and its usage.
Change types from uint16_t (packed) to plain int in order to simplify
modifying the structure and accessing its fields.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
FATE refs changed to accomodate for the new default behavior of the function.
Numbers are now interpreted as a channel layout, instead of a number of channels.
This macro avoids the undefined corner case with the *_MIN values
Previous version Reviewed-by: Ganesh Ajjanagadde <gajjanag@mit.edu>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>