This uses the trigonometric double and triple angle formulae to avoid
repeated (expensive) evaluation of libc's cos().
Sample benchmark (x86-64, Haswell, GNU/Linux)
test: fate-swr-resample-dblp-44100-2626
old:
1104466600 decicycles in build_filter(loop 1000), 256 runs, 0 skips
1096765286 decicycles in build_filter(loop 1000), 512 runs, 0 skips
1070479590 decicycles in build_filter(loop 1000), 1024 runs, 0 skips
new:
588861423 decicycles in build_filter(loop 1000), 256 runs, 0 skips
591262754 decicycles in build_filter(loop 1000), 512 runs, 0 skips
577355145 decicycles in build_filter(loop 1000), 1024 runs, 0 skips
This results in small differences with the old expression:
difference (worst case on [0, 2*M_PI]), argmax 0.008:
max diff (relative): 0.000000000000157289807188
blackman_old(0.008): 0.000363951585488813192382
blackman_new(0.008): 0.000363951585488755946507
These are judged to be insignificant for the performance gain. PSNR to
reference file is unchanged up to second decimal point for instance.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Sebastian Dröge <sebastian@centricular.com>
Previous version reviewed-by: Kieran Kunhya <kierank@obe.tv>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes out of array access
Fixes: 54e488b9da4abbceaf405d6492515697/asan_heap-oob_32769b0_160_a8755eb08ee8f9579348501945a33955.TIF
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes out of array access
Fixes: 24d05e8b84676799c735c9e27d97895e/asan_heap-oob_1b70f6a_2955_7c3652a7f370f9f3ef40642bc2c99bb2.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes out of array read
Fixes: d92114d8c2a019b8a6e50cd2a7301b54/asan_heap-oob_26bf563_60_1d3420277533de9dbf8aba3f93af346f.avi
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This speeds up build_filter by ~ 50%. This gain should be pretty
consistent across all architectures and platforms.
Essentially, this relies on a observation that the filters have some
even/odd symmetry that may be exploited during the construction of the
polyphase filter bank. In particular, phases (scaled to [0, 1]) in [0.5, 1] are
easily derived from [0, 0.5] and expensive reevaluation of function
points are unnecessary. This requires some rather annoying even/odd
bookkeeping as can be seen from the patch.
I vaguely recall from signal processing theory more general symmetries allowing even greater
optimization of the construction. At a high level, "even functions"
correspond to 2, and one can imagine variations. Nevertheless, for the sake
of some generality and because of existing filters, this is all that is
being exploited.
Currently, this patch relies on phase_count being even or (trivially) 1,
though this is not an inherent limitation to the approach. This
assumption is safe as phase_count is 1 << phase_bits, and is hence a
power of two. There is no way for user API to set it to a nontrivial odd
number. This assumption has been placed as an assert in the code.
To repeat, this assumes even symmetry of the filters, which is the most common
way to get generalized linear phase anyway and is true of all currently
supported filters.
As a side note, accuracy should be identical or perhaps slightly better
due to this "forcing" filter symmetries leading to a better phase
characteristic. As before, I can't test this claim easily, though it may
be of interest.
Patch tested with FATE.
Sample benchmark (x86-64, Haswell, GNU/Linux):
test: swr-resample-dblp-44100-2626
new:
527376779 decicycles in build_filter(loop 1000), 256 runs, 0 skips
524361765 decicycles in build_filter(loop 1000), 512 runs, 0 skips
516552574 decicycles in build_filter(loop 1000), 1024 runs, 0 skips
old:
974178658 decicycles in build_filter(loop 1000), 256 runs, 0 skips
972794408 decicycles in build_filter(loop 1000), 512 runs, 0 skips
954350046 decicycles in build_filter(loop 1000), 1024 runs, 0 skips
Note that lower level optimizations are entirely possible, I focussed on
getting the high level semantics correct. In any case, this should
provide a good foundation.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Fixes: 04715144ba237443010554be0d05343f/asan_heap-oob_1eafc76_1737_c685b48041a563461839e4e7ab97abb8.jpg
Fixes out of array access
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
When sbr->reset is set in encode_frame, a bunch of qsort calls might get made.
Thus, there is the potential of calling qsort whenever the spectral
contents change.
AV_QSORT is substantially faster due to the inlining of the comparison callback.
Thus, the increase in performance should be worth the increase in binary size.
Tested with FATE.
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Fixes: libavcodec/rawenc.c:64:40: warning: passing argument 3 of av_image_copy_to_buffer from incompatible pointer type [enabled by default]
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: libavcodec/vdpau.c:320:5: warning: "CONFIG_H263_VDPAU_HWACCEL" is not defined [-Wundef]
It was removed in d15adeacf3
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes an issue where an int64_t ffurl_seek return-value was being stored
in an int (32-bit) "r" variable, leading to integer overflow when seeking
into a large file (>2GB), and ultimately a "Failed to perform internal
seek" error mesage.
To test, try running `ffprobe 'cache:http://<something>'` on a file that
is ~3GB large, whose moov atom is at the end of the file
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
this makes draw_bar faster
slightly different result with old version
check result (with ~3 minutes audio file):
old:
real 0m49.611s
user 0m49.260s
sys 0m0.073s
new:
real 0m47.606s
user 0m47.378s
sys 0m0.068s
PSNR between old and new:
yuv444p PSNR
y:109.519298 u:107.506485 v:104.746878
average:106.816074 min:99.167305 max:inf
yuv422p PSNR
y:109.519298 u:108.025801 v:104.489734
average:107.279817 min:98.007467 max:inf
yuv420p PSNR
y:109.519298 u:108.363875 v:105.290200
average:108.261511 min:97.461812 max:inf
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
FFDIFFSIGN was created explicitly for this purpose, since the common
return a - b idiom is unsafe regarding overflow on signed integers. It
optimizes to branchless code on common compilers.
FFDIFFSIGN also has the subjective benefit of being easier to read due
to lack of ternary operators.
Tested with FATE.
Things not covered by this are unsigned integers, for which overflows
are well defined, and also places where overflow is clearly impossible,
e.g an instance where the a - b was being done on 24 bit values.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Clément Bœsch <u@pkh.me>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This is of use for defining comparator callbacks. Common approaches like
return x-y are not safe due to the risks of overflow.
Furthermore, the (x > y) - (x < y) trick is optimized to branchless
code.
This also documents this macro accordingly.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
There seems to be some typos in the log messages that are fixed by this.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
I've got some m4a samples that had jpeg cover art marked as png. Since
these files were supposedly written by iTunes, and other software can
read it (e.g. clementine does), this should be worked around.
Since png has a very simple to detect header, while it's apparently a
real pain to detect jpeg in the general case, try to detect png and
assume jpeg otherwise. Not bothering with bmp, as I have no test case.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Avoids inheritance of file handles on Windows systems similar to the
O_CLOEXEC/FD_CLOEXEC flag on Linux.
Fixes file lock issues in Windows applications when a child process
is started with handle inheritance enabled (standard input/output
redirection) while a FFmpeg transcoding is running in the parent
process.
Links relevant to the subject:
https://msdn.microsoft.com/en-us/library/w7sa2b22.aspx
Describes the _wsopen() function and the O_NOINHERIT flag. File handles
opened by _wsopen() are inheritable by default.
https://msdn.microsoft.com/en-us/library/windows/desktop/ms682425%28v=vs.85%29.aspx
Describes handle inheritance when creating new processes. Handle
inheritance must be enabled (bInheritHandles = TRUE) e.g. when you want
to pass handles for stdin/stdout via lpStartupInfo.
Signed-off-by: Tobias Rapp <t.rapp@noa-audio.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This converts the usage of strtod to av_strtod in order to unify and
make number parsing more consistent. This also adds support for SI
postfixes.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Some codecs use the codec_tag to signal specific information and
picking the first one would lead to a broken file.
Bug-Id: 883
CC: libav-stable@libav.org
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* commit '407ac22322e5ce67996ec54ef619cafa4c9ceb78':
w32pthreads: Map MemoryBarrier to __sync_synchronize on mingw
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit 'a0562e531723923b632684c7b51a9dd584bf534f':
configure: Add a SONAME entry for the android target
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>