The code expects actual positive numbers and gives completely wrong
results if INT64_MIN is treated as positive
Instead clip it into the valid range that is add 1 and treat it as
negative
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This ensures that no undefined behavior is invoked, while retaining
identical return values in all cases and at no loss of performance
(identical asm on clang and gcc).
Essentially, this patch exchanges undefined behavior with implementation
defined behavior, a strict improvement.
Rationale:
1. The ideal solution is to have the return type a uint64_t. This
unfortunately requires an API change.
2. The only pathological behavior happens if both arguments are
INT64_MIN, to the best of my knowledge. In such a case, the
implementation defined behavior is invoked in the sense that UINT64_MAX
is interpreted as INT64_MIN, which any reasonable implementation will
do. In any case, any usage where both arguments are INT64_MIN is a
fuzzer anyway.
3. Alternatives of checking, etc require branching and lose performance
for no concrete gain - no client cares about av_gcd's actual value when
both args are INT64_MIN. Even if it did, on sane platforms (e.g all the
ones FFmpeg cares about), it produces a correct gcd, namely INT64_MIN.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This uses Stein's binary GCD algorithm:
https://en.wikipedia.org/wiki/Binary_GCD_algorithm
to get a roughly 4x speedup over Euclidean GCD on standard architectures
with a compiler intrinsic for ctzll, and a roughly 2x speedup otherwise.
At the moment, the compiler intrinsic is used on GCC and Clang due to
its easy availability.
Quick note regarding overflow: yes, subtractions on int64_t can, but the
llabs takes care of that. The llabs is also guaranteed to be safe, with
no annoying INT64_MIN business since INT64_MIN being a power of 2, is
shifted down before being sent to llabs.
The binary GCD needs ff_ctzll, an extension of ff_ctz for long long (int64_t). On
GCC, this is provided by a built-in. On Microsoft, there is a
BitScanForward64 analog of BitScanForward that should work; but I can't confirm.
Apparently it is not available on 32 bit builds; so this may or may not
work correctly. On Intel, per the documentation there is only an
intrinsic for _bit_scan_forward and people have posted on forums
regarding _bit_scan_forward64, but often their documentation is
woeful. Again, I don't have it, so I can't test.
As such, to be safe, for now only the GCC/Clang intrinsic is added, the rest
use a compiled version based on the De-Bruijn method of Leiserson et al:
http://supertech.csail.mit.edu/papers/debruijn.pdf.
Tested with FATE, sample benchmark (x86-64, GCC 5.2.0, Haswell)
with a START_TIMER and STOP_TIMER in libavutil/rationsl.c, followed by a
make fate.
aac-am00_88.err:
builtin:
714 decicycles in av_gcd, 4095 runs, 1 skips
de-bruijn:
1440 decicycles in av_gcd, 4096 runs, 0 skips
previous:
2889 decicycles in av_gcd, 4096 runs, 0 skips
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Passing an explicit filename to this command is only necessary if the
documentation in the @file block refers to a file different from the
one the block resides in.
Originally committed as revision 22921 to svn://svn.ffmpeg.org/ffmpeg/trunk
This reduces the number of false dependencies on header files and
speeds up compilation.
Originally committed as revision 22407 to svn://svn.ffmpeg.org/ffmpeg/trunk
Otherwise doxygen complains about ambiguous filenames when files exist
under the same name in different subdirectories.
Originally committed as revision 16912 to svn://svn.ffmpeg.org/ffmpeg/trunk