The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.
Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.
On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes use of uninitialized memory
Fixes: 93728afd9aa074ba14a09bfd93a632fd-asan_static-oob_124a17d_1445_cov_1021181966_DBLK_D_VIXS_1.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.
Arm asm changes by Janne Grunau.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 29 26
Win64: 30 33 25 23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.
Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.
On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
Fixes use of uninitialized memory
Fixes out of array read
Fixes assertion failure
Fixes part of cb307d24befbd109c6f054008d6777b5/asan_static-oob_124a175_1445_cov_2355279992_DBLK_D_VIXS_1.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes inconsistencies
Fixes use of uninitilaized memory
Fixes part of cb307d24befbd109c6f054008d6777b5/asan_static-oob_124a175_1445_cov_2355279992_DBLK_D_VIXS_1.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
benchmarked on sandybridge x86_64:
1358232 decicycles in flac_lpc_32_c
1244575 decicycles in flac_lpc_32_sse4, James Almer's patch
650045 decicycles in flac_lpc_32_sse4, this patch
I haven't tested the edgecases such as odd block lengths
odd block length tested-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Enable compilation on machines with an old libfdk-aac.
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Also adjust header #include order and some comments.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Prevents using GetBitContexts with data from previous calls.
Fixes access to freed memory.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC:libav-stable@libav.org
With cli usage the decoder might have not set the colorspace during
encoder init, manual colorspace override might be needed in such
cases.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This avoids them being cleared before the full initialization finished
Fixes out of array read
Fixes: asan_heap-oob_f0c5e6_7071_cov_1605985132_mov_h264_aac__Demo_FlagOfOurFathers.mov
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Such changes are forbidden in H.264 and lead to race conditions
Fixes out of array read
Fixes: signal_sigsegv_f9796a_1613_cov_3114610371_FM1_BT_B.h264
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes out of array read
Fixes: asan_static-oob_1efed25_1887_cov_2013541199_HeyYa_RA10_AAC_192K_30s.rm
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes out of array read
Fixes: signal_sigsegv_1326a09_1752_cov_245452111_GRTH301.HNS
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Even though the most common framerate for RoQ is 30fps,
the format supports other framerates too.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes use of uninitialized memory
Fixes out of array read
Fixes: asan_static-oob_123cee5_2630_cov_1869071233_PICSIZE_A_Bossen_1.bin
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
It is not necessarily an error when a chunk does not cover a whole block.
Messages did not reflect the actual situation either.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>