and remove FF_CODEC_CAP_INIT_THREADSAFE
All our native codecs are already init-threadsafe
(only wrappers for external libraries and hwaccels
are typically not marked as init-threadsafe yet),
so it is only natural for this to also be the default state.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is possible, because every given FFCodec has to implement
exactly one of these. Doing so decreases sizeof(FFCodec) and
therefore decreases the size of the binary.
Notice that in case of position-independent code the decrease
is in .data.rel.ro, so that this translates to decreased
memory consumption.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This increases type-safety by avoiding conversions from/through void*.
It also avoids the boilerplate "AVFrame *frame = data;" line
for non-subtitle decoders.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, codec.h contains both public and private parts
of AVCodec. This exposes the internals of AVCodec to users
and leads them into the temptation of actually using them
and forces us to forward-declare structures and types that
users can't use at all.
This commit changes this by adding a new structure FFCodec to
codec_internal.h that extends AVCodec, i.e. contains the public
AVCodec as first member; the private fields of AVCodec are moved
to this structure, leaving codec.h clean.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also move FF_CODEC_TAGS_END as well as struct AVCodecDefault.
This reduces the amount of files that have to include internal.h
(which comes with quite a lot of indirect inclusions), as e.g.
most encoders don't need it. It is furthemore in preparation
for moving the private part of AVCodec out of the public codec.h.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The majority of frame-threaded decoders (mainly the intra-only)
need exactly one part of ThreadFrame: The AVFrame. They don't
need the owners nor the progress, yet they had to use it because
ff_thread_(get|release)_buffer() requires it.
This commit changes this and makes these functions work with ordinary
AVFrames; the decoders that need the extra fields for progress
use ff_thread_(get|release)_ext_buffer() which work exactly
as ff_thread_(get|release)_buffer() used to do.
This also avoids some unnecessary allocations of progress AVBuffers,
namely for H.264 and HEVC film grain frames: These frames are not
used for synchronization and therefore don't need a ThreadFrame.
Also move the ThreadFrame structure as well as ff_thread_ref_frame()
to threadframe.h, the header for frame-threaded decoders with
inter-frame dependencies.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Given that the AVCodec.next pointer has now been removed, most of the
AVCodecs are not modified at all any more and can therefore be made
const (as this patch does); the only exceptions are the very few codecs
for external libraries that have a init_static_data callback.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
MagicYUV transmits its Huffman trees by providing the length of the code
corresponding to each symbol; then the decoder has to assemble the table
in such a way that (i) longer codes are to the left of the tree and (ii)
for codes of the same length the symbols are ascending from left to right.
Up until now the decoder did this as follows: It counted the number of
codes of each length and derived the first code of a given length via
(ii). Then the array of lengths is traversed a second time to create
the codes; there is one running counter for each length to do so. This
process creates a default symbol table (that is omitted).
This commit changes this as follows: Everything is indexed by the
position in the tree (with codes to the left first); given (i), we can
calculate the ranges occupied by the codes of each length; and with (ii)
we can derive the actual symbols of each code; the running counters for
each length are now used for the symbols and not for the codes.
Doing so allows us to switch to ff_init_vlc_from_lengths(); this has the
advantage that the codes table needs only be traversed once and that the
codes need not be sorted any more (right now, the codes that are so long
that they will be put into subtables need to be sorted so that codes
that end up in the same subtable are contiguous).
For a sample produced by our encoder (natural content, 4000 frames,
YUV420p, ten iterations, GCC 9.3) this decreased the amount of
decicycles for each call to build_huffman() from 1336049 to 1309401.
Notice that our encoder restricts the code lengths to 12 and our decoder
only uses subtables when the code is longer than 12 bits, so the sorting
that can be avoided does not happen at the moment. If one reduces the
decoder's tables to nine bits, the performance improvement becomes more
apparent: The amount of decicycles for build_huffman() decreased from
1165210 to 654055.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Now that the HuffEntries are no longer sorted by the MagicYUV decoder,
their symbols are trivial: The symbol of the element with index i is i.
They can therefore be removed. Furthermore, despite the length of the
codes being in the range 1..32 bits, the actual value of the codes is
<= 4096 (for 12 bit content). The reason for this is that the longer
codes are on the left side of the tree, so that the higher bits of
these codes are simply zero. By using an uint16_t for the codes and
removing the symbols entry, the size of each HuffEntry is decreased from
eight to four, saving 16KB of stack space.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The MagicYUV format stores Huffman tables in its bitstream by coding
the length of a given symbol; it does not code the actual code directly,
instead this is to be inferred by the rule that a symbol is to the left
of every shorter symbol in the Huffman tree and that for symbols of the
same length the symbol is ascending from left to right.
Our decoder implemented this by first sorting the array containing
length and symbol of each element according to descending length and
for equal length, according to ascending symbol. Afterwards, the current
state in the tree got encoded in a variable code; if the next array entry
had length len, then the len most significant bits of code contained
the code of this entry. Whenever an entry of the array of length
len was processed, code was incremented by 1U << (32 - len). So two
entries of length len have the same effect as incrementing code by
1U << (32 - (len - 1)), which corresponds to the parent node of length
len - 1 of the two nodes of length len etc.
This commit modifies this to avoid sorting the entries before
calculating the codes. This is done by calculating how many non-leaf
nodes there are on each level of the tree before calculating the codes.
Afterwards every leaf node on this level gets assigned the number of
nodes already on this level as code. This of course works only because
the entries are already sorted by their symbol initially, so that this
algorithm indeed gives ascending symbols from left to right on every
level.
This offers both speed- as well as (obvious) codesize advantages. With
Clang 10 the number of decicycles for build_huffman decreased from
1561987 to 1228405; for GCC 9 it went from 1825096 decicyles to 1429921.
These tests were carried out with a sample with 150 frames that was
looped 13 times; and this was iterated 10 times. The earlier reference
point here is from the point when the loop generating the codes was
traversed in reverse order (as the patch reversing the order led to
performance penalties).
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The MagicYUV format stores Huffman tables in its bitstream by coding
the length of a given symbol; it does not code the actual code directly,
instead this is to be inferred by the rule that a symbol is to the left
of every shorter symbol in the Huffman tree and that for symbols of the
same length the symbol is ascending from left to right. With one
exception, this is also what our decoder did.
The exception only matters when there are codes of length 32, because
in this case the first symbol of this length did not get the code 0,
but 1; e.g. if there were exactly two nodes of length 32, then they
would get assigned the codes 1 and 2 and a node of length 31 will get
the 31-bit code 1 which is a prefix of the 32 bit code 2, making the
Huffman table invalid. On the other hand, if there were only one symbol
with the length 32, the earlier code would accept this un-Huffman-tree.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The MagicYUV decoder currently sets both the length and the symbol field
of an array of HuffEntries; hereby the symbol of the ith entry (0-based)
is just i. Then said array gets sorted so that entries with greater
length are at the end and entries with the same length are ordered so
that those with smaller symbols are at the end. Afterwards the newly
sorted array is traversed in reverse order. This commit instead inverts
the ordering and traverses the array in its ordinary order in order to
simplify understanding.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Every plane of each slice has to contain at least two bytes for flags
and the type of prediction used.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The lengths of the VLC codes are implicitly contained in the VLC tables
itself; apart from that they are not used lateron. So it is unnecessary
to store them and the very same array can be reused to parse the Huffman
table for the next plane.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The code already checks that exactly the expected amount of entries are
read and set. Ergo it is unnecessary to zero them at the beginning.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Up until now, there were three comparison functions depending upon
bitness. But they all are actually the same, namely a lexical ordering:
entry a > entry b iff a.len > b.len or a.len == b.len and a.sym < b.sym.
So they can be easily unified.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
When parsing Huffman tables, an array of HuffEntries (a struct
containing a code's bitlength, its bits and its symbol) is used as
intermediate tables in order to sort the entries (the order depends on
both the length of the entries as well as on their symbols). After sorting
them, the symbol and len components are copied into other arrays (the
HuffEntries' code has never been set or used, despite using quite a lot
of stack space) and the codes are generated. Afterwards, the VLC is
created.
Yet ff_init_vlc_sparse() can handle non-continuous arrays as input;
there is no need to copy the entries at all. This commit implements
this.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
When the MagicYUV decoder builds Huffman tables from an array of code
lengths, it proceeds as follows: First it copies the entries of the
array of lengths into an array of HuffEntries (a struct which contains
a length and a symbol field); it also sets the symbol field in
descending order from nb_elem - 1 to 0, where nb_elem is the common number
of elements of the length and HuffEntry arrays. Then the HuffEntry array
is sorted lexicographically: a > b iff a.len > b.len or a.len == b.len and
a.sym > b.sym. Afterwards the symbols of the so sorted array are
inverted again (i.e. each symbol sym is replaced by nb_elem - sym).
Yet inverting can easily be avoided altogether: Just modify the order so
that smaller symbols correspond to bigger HuffEntries. This leads to the
same permutation as the current code does and given that the two
inversions just cancel each other out, the result is the same.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The current design, where
- proper init is called for the first per-thread context
- first thread's private data is copied into private data for all the
other threads
- a "fixup" function is called for all the other threads to e.g.
allocate dynamically allocated data
is very fragile and hard to follow, so it is abandoned. Instead, the
same init function is used to init each per-thread context. Where
necessary, AVCodecInternal.is_copy can be used to differentiate between
the first thread and the other ones (e.g. for decoding the extradata
just once).
Fixes: out of array access
Fixes: 20763/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MAGICYUV_fuzzer-5759562508664832
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: Timeout
Fixes: 8690/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MAGICYUV_fuzzer-6542020913922048
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>