This is a trivial rewrite of the loops that results in better
prefetching and associated cache efficiency. Essentially, the problem is
that modern prefetching logic is based on finite state Markov memory, a reasonable
assumption that is used elsewhere in CPU's in for instance branch
predictors.
Surrounding loops all iterate forward through the array, making the
predictor think of prefetching in the forward direction, but the
intermediate loop is unnecessarily in the backward direction.
Speedup is nontrivial. Benchmarks obtained by 10^6 iterations within
solve_lls, with START/STOP_TIMER. File is tests/data/fate/flac-16-lpc-cholesky.err.
Hardware: x86-64, Haswell, GNU/Linux.
new:
17291 decicycles in solve_lls, 2096706 runs, 446 skips
17255 decicycles in solve_lls, 4193657 runs, 647 skips
17231 decicycles in solve_lls, 8384997 runs, 3611 skips
17189 decicycles in solve_lls,16771010 runs, 6206 skips
17132 decicycles in solve_lls,33544757 runs, 9675 skips
17092 decicycles in solve_lls,67092404 runs, 16460 skips
17058 decicycles in solve_lls,134188213 runs, 29515 skips
old:
18009 decicycles in solve_lls, 2096665 runs, 487 skips
17805 decicycles in solve_lls, 4193320 runs, 984 skips
17779 decicycles in solve_lls, 8386855 runs, 1753 skips
18289 decicycles in solve_lls,16774280 runs, 2936 skips
18158 decicycles in solve_lls,33548104 runs, 6328 skips
18420 decicycles in solve_lls,67091793 runs, 17071 skips
18310 decicycles in solve_lls,134187219 runs, 30509 skips
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
In some conditions, where the first band was being zeroed
mainly, the wrong global gain scalefactor would be written
to the stream since it's always taken from the first band
regardless of whether it's been marked as zero or not.
So, always make sure it contians something useful.
When both M/S coding and PNS are enabled, scalefactors
and coding books would be mistakenly clobbered when setting
the M/S flag on PNS'd bands. The flag needs to be set to
signal the generation of correlated noise, but the scalefactors,
coefficients and the coding books need to be kept intact.
Commit 14ea4151d7 had a bug in that the
conversion of the uint64_t result to an int (the return signature) would
lead to implementation defined behavior, and in this case simply
returned 0 for NAN. A fix via AND'ing the result with 1 does the trick,
simply by ensuring a 0 or 1 return value.
Patch tested with FATE on x86-64, GNU/Linux by forcing the compatibility
code via an ifdef hack suggested by Michael.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
lrintf is anyway used, suggesting we only care up to floating precision.
Rurthermore, there is a compat hack in avutil/libm for this function,
and it is used in avcodec/aacps_tablegen.h.
This yields a non-negligible speedup. Sample benchmark:
x86-64, Haswell, GNU/Linux:
old (draw_mandelbrot):
274635709 decicycles in draw_mandelbrot, 256 runs, 0 skips
300287046 decicycles in draw_mandelbrot, 512 runs, 0 skips
371819935 decicycles in draw_mandelbrot, 1024 runs, 0 skips
336663765 decicycles in draw_mandelbrot, 2048 runs, 0 skips
581851016 decicycles in draw_mandelbrot, 4096 runs, 0 skips
new (draw_mandelbrot):
269882717 decicycles in draw_mandelbrot, 256 runs, 0 skips
296359285 decicycles in draw_mandelbrot, 512 runs, 0 skips
370076599 decicycles in draw_mandelbrot, 1024 runs, 0 skips
331478354 decicycles in draw_mandelbrot, 2048 runs, 0 skips
571904318 decicycles in draw_mandelbrot, 4096 runs, 0 skips
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This option can be used to select useful frames from an ffconcat file which is
using inpoints and outpoints but where the source files are not intra frame
only.
Reviewed-by: Stefano Sabatini <stefasab@gmail.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
If duration is still AV_NOPTS_VALUE when opening the next file, we can assume
that outpoint is not set.
Reviewed-by: Nicolas George <george@nsup.org>
Signed-off-by: Marton Balint <cus@passwd.hu>
Fixes out of array access
Fixes: 1430e9c43fae47a24c179c7c54f94918/signal_sigsegv_421427_2049_f2192b6829ab6e0eefcb035329c03c60.264
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
There is no such thing as a slice structured mode in the original version 1 H.263,
that mode was added in H.263+ in 1998. Also the headers for slice structured mode
are not part of the older version 1 and this would result in unplayable files
An alternative to this patch would be to merge the H263 and H263P AVCodecs and use
other means to distinguish the older and newer versions.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This actually fixes an incorrect float literal. It is believed by
examining the precision that the literals were all pre-computed as
floats, resulting in this needless loss of precision. There is no
benefit to keeping such reduced precision:
1. These constants are used for static array computation, hence
compile-time.
2. They will be treated as doubles anyway, since f specifier was not
present.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This uses M_SQRT1_2, M_SQRT2 instead of the actual literals. This yields
greater precision in some places in avcodec/ac3, while fixed point
values remain unchanged.
Reviewed-by: Clément Bœsch <u@pkh.me>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This uses M_SQRT1_2, M_SQRT2 instead of the actual literals. Fixed point
values remain unchanged.
Patch tested with FATE on x86.
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>