This reverts commit 61669b7c40.
This commit broke building with MSVC due to its spec-incompliant handling
of ',' in __VA_ARGS__: These are not treated as argument separators for
further macros, so that in our case the init_vlc2() macro is treated as
having only one argument whenever the init_vlc() macro is used. See [1]
for further details.
[1]: https://reviews.llvm.org/D69626
Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Besides the obvious advantage of less code this also has a performance
impact: For GCC 9 the time spent on one call to smka_decode_frame() for
the sample from ticket #2425 decreased from 1693619 to 1498127
decicycles. For Clang 9, it decreased from 1369089 to 1366465
decicycles.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Up until now, the Smacker decoder has pretended that the prediction
values are signed in code like 'pred[0] += (unsigned)sign_extend(val, 16)'
(the cast has been added to this code later to fix undefined behaviour).
This has been even done in case the PCM format is u8.
Yet in case of 8/16 bit samples, only the lower 8/16 bit of the predicition
values are ever used, so one can just as well just use unsigned and
remove the sign extensions. This is what this commit does.
For GCC 9 the time for one call to smka_decode_frame() for the sample from
ticket #2425 decreased from 1709043 to 1693619 decicycles; for Clang 9
it went up from 1355273 to 1369089 decicycles.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
by using buffers on the stack instead. The fact that the effective
lifetime of most of the allocated buffers doesn't overlap enables one to
limit the stack space used to a fairly modest size (about 1.5 KiB).
That all the buffers used in HuffContexts have always the same number of
elements (namely 256) makes it possible to include the buffers directly
in the HuffContext. Doing so also makes the length field redundant; it has
therefore been removed.
This is beneficial for performance: For GCC 9 the time for one call to
smka_decode_frame() for the sample in ticket #2425 went down from
1794494 to 1709043 decicyles; for Clang 9 it decreased from 1449420 to
1355273 decicycles.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Up until now, the return value of get_vlc2() has been used as an index
in an array that contained the value one is really interested in. Yet
since b613bacca9 this is no longer
necessary, as one can store the value that is right now stored in the
array in the VLC internal table.
This also means that all the information from the eight bit Huffman trees
are now stored in the corresponding VLC table; this will enable us to
remove several allocations lateron.
This improved performance: For GCC 9 the time for one call of
smka_decode_frame() for the sample from ticket #2425 decreased from
1811706 to 1794494 decicycles; for Clang 9 the number went from 1471663
to 1449420 decicycles.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
This will mean that we will need less stack space lateron when these
arrays are no longer heap-allocated.
No discernible speed impact.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Smacker uses two types of Huffman trees: Those for eight bit values and
those for 16 bit values. Given that both return their values via arrays
and that both need to check not to overrun their array, the context for
parsing eight bit values (HuffContext) will necessarily exhibit certain
similarities with the context used for parsing 16 bit values (DBCtx).
These similarities led to using a HuffContext in addition a DBCtx for
parsing 16 bit trees.
This stands in the way of further developments for the HuffContext struct
(when parsing eight bit trees, the length of the arrays are always 256,
so that one can inline said value and move the currently heap-allocated
tables directly in the structure).
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Using explicit checks has the advantage that one can combine several
checks into one and does not have to check every time. E.g. reading a
16bit PCM sample involves two calls to get_vlc2(), each of which may
read up to three times up to SMKTREE_BITS (= 9) bits. But given that the
padding that the input packet is supposed to have is large enough, it is
no problem to only check once for each sample.
This turned out to be beneficial for performance: For GCC 9, the time for
one call of smka_decode_frame() for the sample from ticket #2425 went down
from 2055905 to 1804751 decicycles; for Clang 9 it went down from 1510538
to 1479680 decicycles.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The VLC codes in question originate from a Huffmann tree and so every
sequence of bits that is longer than the longest code contains an
initial sequence that is a valid code. Given that it has been checked
during reading said tree (and once again in ff_init_vlc_sparse()) that
the length of each code is <= 3 * the number of bits read at once when
reading codes, get_vlc2() will always find a matching entry.
These checks have been added in 71d3c25a7e
at a time when the length of the codes had not been checked when parsing
the tree.
For GCC 9 and the sample from ticket #2425 this led to a slight
performance regression: The time for one call to smka_decode_frame()
increased from 2053671 to 2064529 decicycles; for Clang 9, performance
improved from 1521288 to 1508459 decicycles.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
When length is zero for a leaf node (which happens iff the Huffman tree
consists of one leaf node only), prefix is also automatically zero.
Performance impact is negligible: For GCC 9 and the sample from #2425,
the time for one call to smka_decode_frame() decreased from 2053758 to
2053671 decicycles; for Clang 9 it went from 1523153 to 1521288.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
With the possible exception of the "last" values when decoding video,
only the part that is actually initialized with values derived from the
bitstream is used afterwards, so it is unnecessary to zero everything at
the beginning. This is also no problem for the "last" values at all,
because they are reset for every frame anyway.
While at it, use sizeof(variable) instead of sizeof(type).
Performance increased slightly: For GCC, from 2068389 decicycles per call
to smka_decode_frame() when decoding the sample from ticket #2425 to 2053758
decicycles; for Clang, from 1534188 to 1523153 decicycles.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Using the real number of read codes allows to leave a loop in
ff_init_vlc_sparse earlier; notice that all codes not explicitly
set by reading data have been set to zero earlier (i.e. they are
zero-length codes) and such codes are ignored by ff_init_vlc_sparse.
This improves performance: When compiled with GCC 9, the time spent on
one call to smka_decode_frame() for the sample from ticket #2425
decreased from 2195367 decicycles to 2068389 decicycles. For Clang 9,
it improved from 1602075 to 1534188 decicycles. These tests have been
performed 20 times and each times the input file has been looped
32 times to get a sufficient number of frames.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Given that the code currently accepts only 27 bits long Huffman codes,
the shift 1 << (length - 1) with length in 1..28 that is performed when
parsing the tree is safe. Yet if this limit were ever expanded to the
full 32 bits, this shift would be potentially undefined. So simply use
unsigned.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
smacker_decode_header_tree() uses different variables for return values
(res) and for errors (err) leading to code like
res = foo(bar);
if (res < 0) {
err = res;
goto error;
}
Given that no positive return value is ever used at all one can simplify
the above by removing the intermediate res.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The earlier version did not error out directly in case an error happens,
because it would lead to a leak: An allocated array is only reachable
via a local variable at that time; it is only attached to more permanent
storage at the end. While it would be possible to add custom code for
freeing on error (instead of reusing the ordinary code for doing so),
this commit takes the opposite approach and attaches the newly allocated
array to its permanent place immediately after its allocation.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The extradata for Smacker video contains Huffman trees as well as a
field containing the size (in bytes) of said Huffman tree when stored
as a table. Due to three special values the decoder allocates more than
the size field indicates; yet when it parses the table it only errors
out if the number of elements exceeds the number of allocated elements
and not the number of elements as indicated by the size field. As a
consequence, there might be less than three elements available at the
end, so that another check for this is necessary.
This commit changes this: It is always made sure that the three elements
reserved to (potentially) use them to store the special values are not
used to store ordinary tree entries. This allows to remove the extra
check at the end.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The Huffmann tables used by Smacker can consist of exactly one leaf only
in which case the length of the corresponding code is zero; there is
then exactly one value encoded. Our VLC can't handle this and therefore
this case needs to be treated separately; it has been implemented in
commit 48cbdaea15. Yet said commit also
made the decoder emit an error message (despite not erroring out) in this
case, although it seems that this is rather a limitation of our VLC API.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Fixes: signed integer overflow: -2147481503 + -32732 cannot be represented in type 'int'
Fixes: 17782/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SMACKAUD_fuzzer-5769672225456128
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 238 * 16843009 cannot be represented in type 'int'
Fixes: 16958/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SMACKER_fuzzer-5193905355620352
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Some decoders may not need a writable buffer in some specific cases, but only
a reference to the existing buffer with updated frame properties instead, for
the purpose of returning duplicate frames. For this, the
FF_REGET_BUFFER_FLAG_READONLY flag is added, which will prevent potential
allocations and buffer copies when they are not needed.
Signed-off-by: James Almer <jamrial@gmail.com>
If all tables are skipped it would be impossible to encode any
"non black" video.
Fixes: Timeout (78sec -> 1ms)
Fixes: 15821/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SMACKER_fuzzer-5652598838788096
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This fixes segmentation faults due to stack-overflow caused by too deep
recursion.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Sean McGovern <gseanmcg@gmail.com>
This fixes segmentation faults due to stack-overflow caused by too deep
recursion.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
This avoids the danger that get_bits.h might get indirectly #included before
BITSTREAM_READER_LE is defined.
Also sort headers into canonical order where appropriate.
Fixes out of array access
Fixes: ce19e41f0ef1e52a23edc488faecdb58/asan_heap-oob_2504e97_4202_ffa0df1baed14022b9bfd4f8ac23d0cb.smk
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes use of uninitialized memory
Fixes: msan_uninit-mem_7f6e83322950_9769_wetlogo.smk
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>