It is not allowed to change mid-stream like it does currently. Instead we need
to buffer the first 8 frames before returning them as a single packet, then
only return single frame packets after that.
Prevents crash when trying to copy from a non-existing plane in e.g.
a RGB32 reference image to a YUV420P target image
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Instead of clipping extrasize based on EXTRABYTES, clip based on the
amount of buffer actually left. Without this fix, there are warbles
and other distortions in the test case below.
http://kevincennis.com/mix/assets/sounds/1901_voxfx.mp3
This prevents crashes when trying to read beyond the end of the buffer
while decoding frame data.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Unrolling the main loop to process, instead of 4 elements:
- 8: minor gain of 2 cycles (not worth the extra object size)
- 2: loss of 8 cycles.
Assigning STEP to a register is a loss. Output address (Y) is almost always
unaligned.
Timings:
- C (32/64 bits): 117/109 cycles
- SSE: 57 cycles
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C /32bits: 82c (unrolled)/102c
C /64bits: 69c (unrolled)/82c
SSE/32bits: 42c
SSE/64bits: 31c
Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Since we are clipping before we shift the values to
16 or 32 bits, we should not shift the min/max clip
values to compensate.
Fixes 8 and 24 bit lossy decoding.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
If the PNG filter is enabled, a PNG-style filter will run over the
input buffer, writing into the buffer. Therefore, if no zlib compression
was used, ensure that we copy into a temporary buffer, otherwise we
overwrite user-provided input data.