Improved DVB subtitles encoder to generate AVPacket.data in the same
format as generates MPEGTS demuxer + DVB subtitles parser. So now single
format of DVB subtitles data is used across all the components of FFmpeg:
only subtitles payload WITHOUT 0x20 0x00 bytes at the beginning and 0xFF
trailing byte.
Improved MPEGTS muxer to support format of DVB subtitles in
AVPacket.data described above: while muxing we add two bytes 0x20 0x00 to
the beginning of and 0xFF to the end of DVB subtitles payload.
The patch fixes DVB subtitle copy problems: tickets #2989 fully and #2024
partly.
Signed-off-by: Clément Bœsch <u@pkh.me>
The src buffer should only contain values in the interval
[0, (1 << BIT_DEPTH) - 1]. Since shift = (BIT_DEPTH - 5), src[x] >> shift
must be in the interval [0, 31], so no clip is needed.
This removes the code that was changed in 5856bca360
as the clip that was repositioned in that commit is removed
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes: asan_stack-oob_eae8e3_7333_WPP_B_ericsson_MAIN10_2.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
This is a more proper fix than 5856bca360
The reconstructed picture should always be clipped (see section 8.6.5),
previously we did not clip coding units where
cu_transquant_bypass_flag == 1
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Prevent an out of array bound read.
Fixes: asan_stack-oob_eae8e3_7333_WPP_B_ericsson_MAIN10_2.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes use of uninitialized memory and out of stack array read
Fixes: signal_sigsegv_ecc526_7846_WPP_C_ericsson_MAIN_2.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes use of uninitialized memory
Fixes: msan_uninit-mem_7f8b64436530_7895_quicktime_newcodec_applelosslessaudiocodec.m4a
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes use of uninitialized memory
partly fixes: msan_uninit-mem_7f7834b6a530_6473_luckynight-partial.wma
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes use of uninitialized memory
partly fixes: msan_uninit-mem_7f7834b6a530_6473_luckynight-partial.wma
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
It stored images wrong in the user provided buffers (that is you would
end up with a wrongly flipped image if you used direct rendering).
Also it used wrong dimensions as noticed by ubitux
Enable the old code unconditionally so flipping works correctly
again.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
They are not measurably faster on x86, they might be somewhat faster on
other platforms due to missing emu edge SIMD, but the gain is not large
enough to justify the added complexity.
They are not measurably faster on x86, they might be somewhat faster on
other platforms due to missing emu edge SIMD, but the gain is not large
enough to justify the added complexity.
Several decoders disable those anyway and they are not measurably faster
on x86. They might be somewhat faster on other platforms due to missing
emu edge SIMD, but the gain is not large enough (and those decoders
relevant enough) to justify the added complexity.
untested (noone tested within about a month) and the change is
quite trivial so should be ok. While the code before this change
is broken.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Runtime of the full 32x32 idct goes from 2446 to 2441 cycles (intra) or
from 1425 to 1306 cycles (inter). Overall runtime is not significantly
affected.
Runtime of all IDCTs together goes from 3327 to 2473 cycles (intra, i.e.
~35% faster) or from 2312 to 1448 cycles (inter, i.e. ~60% faster). Total
decode time of ped1080p.webm goes from 8.086sec to 7.974sec (1.4% faster).
Sub-IDCTs will follow later. ped1080.webm goes from 9.295s to 8.191s
(13.5% faster). The IDCT itself goes from 4372 (intra) or 4337 (inter)
to 403 (intra) or 329 (inter) cycles for the DC-only form, 23755 (intra)
or 23723 (inter) to 3497 (intra) or 3607 (inter) cycles for the no-DC
form, which averages from 23393 (intra) or 16612 (inter) to 3449 (intra)
or 2392 (inter) for all 32x32s together, i.e. about ~7x faster (all
tests done on ped1080p.webm).
The function macro always sets .align 2 before declaring the
function label (since 5c5e1ea3) and always sets the section to
.text (since 278caa6a).
The .align 5 before certain functions, added in fc252eba, were added
before .text and .align were added to the function macro and thus
became useless/unused when the function macro got them.
This restores the original intention, to align the loop entry
points.
Signed-off-by: Martin Storsjö <martin@martin.st>