This proved beneficial for performance: For the sample [1] the number
of decicycles in one decode call decreased from 155851561 to 108158037
for Clang 10 and from 168270467 to 128847479 for GCC 9.3. For x86-32
compiled with GCC 9.3 and run on an x64 Haswell the number increased
from 158405517 to 202215769, so that the cached bitstream reader is only
enabled if HAVE_FAST_64BIT is set. These values are the average of 10
runs each looping five times over the input.
[1]: samples.ffmpeg.org/ffmpeg-bugs/trac/ticket2593/fraps_flv1_decoding_errors.avi
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The fraps decoder already checked for overreads manually (and errored
out in this scenario), yet it still enabled implicit checks, leading to
worse performance and more code size.
This commit disables the implicit bitstream reader checks. For the
sample [1] this improves performance from 195105896 to 155851561
decicycles for Clang 10 and from 222801887 to 168270467 decicycles when
compiled with GCC 9.3. These values are the average of 10 runs each
looping ten times over the input.
[1]: samples.ffmpeg.org/ffmpeg-bugs/trac/ticket2593/fraps_flv1_decoding_errors.avi
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Also break some long lines, remove codec function placeholder comments
and add spaces in sample/pixel format lists.
Signed-off-by: Martin Storsjö <martin@martin.st>
Prevents crash when trying to copy from a non-existing plane in e.g.
a RGB32 reference image to a YUV420P target image
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Offsets are relative to the end of the header, not the
start of the buffer, thus the buffer size needs to be subtracted.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Codec is too simple to gain much from it at lower resolutions,
but should help at very high resolutions, particularly for
v3 and v5 where a not too optimized pseudo-YUV to RGB
is done in the codec.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
With gcc 4.6 this part of the code is ca. 4x faster, resulting
in an overall speedup of around 5% for fate-fraps-v5 sample.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Codec has only I- and skip-frames, so there is no
need for reget_buffer, change it so it works with
get_buffer.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>