EBML binary elements are already made reference-counted when read;
so when populating the AVStream.attached_pic, one does not need to
allocate a new buffer for the data; instead the current code just
creates a new reference to the underlying AVBuffer. But this can be
improved even further: Just move the already existing reference.
This also fixes a memleak that happens upon error because
matroska_read_close has not been called in this scenario.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Each AttachedFile in Matroska can have a FileDescription element that
contains a human-friendly name for the attached file; yet this element
has been ignored up until now. This commit changes this and exports it
as title tag instead (the Matroska muxer mapped the title tag to the
AttachedFile element since support for Attachments was added).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The Matroska specification allows multiple (level 1) Tags elements per
file, yet our demuxer didn't: While it parsed any amount of Tags
elements it found in front of the Clusters (albeit with warnings because
of duplicate elements), it would treat any Tags element only referenced
via a SeekHead entry as already parsed if any Tags element has already
been parsed; therefore this Tags element would not be parsed at all.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
There can be more than one SeekHead in a Matroska file, but most of the
other level 1 elements can only occur once.* Therefore the Matroska
demuxer only allows one entry per ID in its internal list of level 1
elements known to it; the only exception to this are SeekHeads.
The only exception to this are SeekHeads: When one is encountered
(either directly or in the list of entries read from SeekHeads),
a new entry in the list of known level-1 elements is always added,
even when this entry is actually already known.
This leads to lots of seeks in case of circular SeekHeads: Each time a
SeekHead is parsed, a new entry for a SeekHead will be added to the list
of entries read from SeekHeads. The exception for SeekHeads mentioned
above now implies that this SeekHead will always appear new and unparsed
and parsing will be attempted. This continued until the list of known
level-1 elements is full.
Fixing this is pretty simple: Don't add a new entry for a SeekHead if
its position matches the position of an already known SeekHead.
*: Actually, there can be multiple Tags and several other level 1
elements are "identically recurring" which means they may be resent
multiple times, but each instance must be absolutely identical to the
previous.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
A Seek element in a Matroska SeekHead should contain a SeekID and a
SeekPosition element and upon reading, they should be sanitized:
Given that IDs are restricted to 32 bit, longer SeekIDs should be treated
as invalid. Instead currently the lower 32 bits have been used.
For SeekPosition, no checks were performed for the element to be
present and if present, whether it was excessively large (i.e. the
absolute file position described by it exceeding INT64_MAX). The
SeekPosition element had a default value of -1 which means that a check
seems to have been intended; but it was not implemented. This commit adds
a check for overflow to the calculation of the absolute file position of
the referenced level 1 elements.
Using -1 (i.e. UINT64_MAX) as default value for SeekPosition implies that
a Seek element without SeekPosition will run afoul of this check.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Since commit 979b5b8959, reverting the
Matroska ContentCompression is no longer done inside
matroska_parse_frame() (the function that creates AVPackets out of the
parsed data (unless we are dealing with certain codecs that need special
handling)), but instead in matroska_parse_block(). As a consequence,
the data that matroska_parse_frame() receives is no longer always owned
by an AVBuffer; it is owned by an AVBuffer iff no ContentCompression needed
to be reversed; otherwise the data is independently allocated and needs
to be freed on error.
Whether the data is owned by an AVBuffer or not is indicated by a variable
buf of type AVBufferRef *: If it is NULL, the data is independently
allocated, if not it is owned by the underlying AVBuffer (and is used to
avoid copying the data when creating the AVPackets).
Because the allocation of the buffer holding the uncompressed data happens
outside of matroska_parse_frame() (if a ContentCompression needs to be
reversed), the data is passed as uint8_t ** in order to not leave any
dangling pointers behind in matroska_parse_block() should the data need to
be freed: In case of errors, said uint8_t ** would be av_freep()'ed in
case buf indicated the data to be independently allocated.
Yet there is a problem with this: Some codecs (namely WavPack and
ProRes) need special handling: Their packets are only stored in
Matroska in a stripped form to save space and the demuxer reconstructs
full packets. This involved allocating a new, enlarged buffer. And if
an error happens when trying to wrap this new buffer into an AVBuffer,
this buffer needs to be freed; yet instead the given uint8_t ** (holding
the uncompressed, yet still stripped form of the data) would be freed
(av_freep()'ed) which certainly leads to a memleak of the new buffer;
even worse, in case the track does not use ContentCompression the given
uint8_t ** must not be freed as the actual data is owned by an AVBuffer
and the data given to matroska_parse_frame() is not the start of the
actual allocated buffer at all.
Both of these issues are fixed by always freeing the current data in
case it is independently allocated. Furthermore, while it would be
possible to track whether the pointer from matroska_parse_block() needs
to be reset or not, there is no gain in doing so, as the pointer is not
used at all afterwards and the sematics are clear: If the data passed
to matroska_parse_frame() is independently allocated, then ownership
of the data passes to matroska_parse_frame(). So don't pass the data
via uint8_t **.
Fixes Coverity ID 1462661 (the issue as described by Coverity is btw
a false positive: It thinks that this error can be triggered by ProRes
with a size of zero after reconstructing the original packets, but the
reconstructed packets can't have a size of zero).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Reindentation as well as marking several variables used for demuxing
RealAudio as const to clearly see that they don't change during
demuxing.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The Matroska demuxer has three functions for creating packets out of
the data read: One for certain RealAudio codecs (ATRAC3, cook, sipr,
RealAudio 28.8), one for WebVTT (actually, the WebM flavour of it) and
one for all the others. Only the last function supported Matroska's
ContentCompression (e.g. it reversed zlib compression or added the
removed headers to the packets). But in Matroska, all tracks are allowed
to be compressed. This commit adds support for this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Matroska is built around the principle that a reader does not need to
understand everything in a file in order to be able to make use of it;
it just needs to ignore the data it doesn't know about.
Our demuxer typically follows this principle, but there is one important
instance where it does not: A Block belonging to a TrackEntry with no
associated stream is treated as invalid data (i.e. the demuxer will try
to resync to the next level 1 element because it takes this as a sign
that it has lost sync). Given that we do not create streams if we don't
know or don't support the type of the TrackEntry, this impairs this
demuxer's forward compability.
Furthermore, ignoring Blocks belonging to a TrackEntry without
corresponding stream can (in future commits) also be used to ignore
TrackEntries with obviously bogus entries without affecting the other
TrackEntries (by not creating a stream for said TrackEntry).
Finally, given that matroska_find_track_by_num() already emits its own
error message in case there is no TrackEntry with a given TrackNumber,
the error message (with level AV_LOG_INFO) for this can be removed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
A Block (meaning both a Block in a BlockGroup as well as a SimpleBlock)
must have at least three bytes after the field containing the encoded
TrackNumber. So if there are <= 3 bytes, the Matroska demuxer would
skip this block, believing it to be an empty, but valid Block.
This might discard valid nonempty Blocks, namely if the track uses header
stripping. And certain definitely spec-incompliant Blocks don't raise
errors: Those with two or less bytes left after the encoded TrackNumber
and those with three bytes left, but with flags indicating that the Block
uses lacing as then there has to be further data describing the lacing.
Furthermore, zero-sized packets were still possible because only the
size of the last entry of a lace was checked.
This commit fixes this. All spec-compliant Blocks that contain data
(even if side data only) are now returned to the caller; spec-compliant
Blocks that don't contain anything are not returned.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Some conditions which don't change and which can therefore be checked
in read_header() were instead rechecked upon parsing each block. This
has been changed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The Matroska demuxer splits every sequence of h Matroska Blocks into
h * w / cfs packets of size cfs; here h (sub_packet_h), w (frame_size)
and cfs (coded_framesize) are parameters from the track's CodecPrivate.
It does this by splitting the Block's data in h/2 pieces of size cfs each
and putting them into a buffer at offset m * 2 * w + n * cfs where
m (range 0..(h/2 - 1)) indicates the index of the current piece in the
current Block and n (range 0..(h - 1)) is the index of the current Block
in the current sequence of Blocks. The data in this buffer is then used
for the output packets.
The problem is that there is currently no check to actually guarantee
that no uninitialized data will be output. One instance where this is
trivially so is if h == 1; another is if cfs * h is so small that the
input pieces do not cover everything that is output. In order to
preclude this, rmdec.c checks for h * cfs == 2 * w and h >= 2. The
former requirement certainly makes much sense, as it means that for
every given m the input pieces (corresponding to the h different values
of n) form a nonoverlapping partition of the two adjacent frames of size w
corresponding to m. But precluding h == 1 is not enough, other odd
values can cause problems, too. That is because the assumption behind
the code is that h frames of size w contain data to be output, although
the real number is h/2 * 2. E.g. for h = 3, cfs = 2 and w = 3 the
current code would output four (== h * w / cfs) packets. although only
data for three (== h/2 * h) packets has been read.
(Notice that if h * cfs == 2 * w, h being even is equivalent to
cfs dividing w; the latter condition also seems very reasonable:
It means that the subframes are a partition of the frames.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
RealAudio 28.8 (like other RealAudio codecs) uses a special demuxing
mode in which the data of the existing Matroska Blocks is not simply
forwarded as-is. Instead data from several Blocks is recombined
together to output several packets. The parameters governing this
process are parsed from the CodecPrivate: Coded framesize (cfs), frame
size (w) and sub_packet_h (h).
During demuxing, h/2 pieces of data of size cfs each are read from every
Matroska (Simple)Block and put at offset m * 2 * w + n * cfs of a buffer
of size h * w, where m ranges from 0 to h/2 - 1 for each Block while n
is initially zero and incremented after a Block has been parsed until it
is h, at which poin the assembled packets are output and n reset.
The highest offset is given by (h/2 - 1) * 2 * w + (h - 1) * cfs + cfs
while the destination buffer's size is given by h * w. For even h, this
leads to a buffer overflow (and potential segfault) if h * cfs > 2 * w;
for odd h, the condition is h * cfs > 3 * w.
This commit adds a check to rule this out.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
RealAudio 28.8 does not need or use sub_packet_size for its demuxing
and this field is therefore commonly set to zero. But since 18ca491b
the Real Audio specific demuxing is no longer applied if sub_packet_size
is zero because the codepath for cook and ATRAC3 divide by it; this made
these files undecodable.
Furthermore, since 569d18aa (merged in 2c8d876d) sub_packet_size being
zero is used as an indicator for invalid data, so that a file containing
such a track was completely skipped.
This commit fixes this by not checking sub_packet_size for RealAudio
28.8 at all.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
They need a special parsing mode and in order to find out whether this
mode is in use, several checks have to be performed. They can all be
combined into one: If the buffer that is only used to assemble their
packets has been allocated, use the RealAudio parsing mode.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Only flavors 0..3 seem to exist. E.g. rmdec.c treats any flavor > 3
as invalid data. Furthermore, we do not know how big the packets to
create ought to be given that for sipr these values are not read from
the bitstream, but from a table.
Furthermore, flavor is only used for sipr, so only check it for sipr;
rmdec.c does the same. (The old check for flavor being < 0 was
always wrong given that flavor is an int that is read via avio_rb16(),
so it has been removed completely.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Chapter titles are added to the chapter's metadata since 6cb6e159,
yet since 012867f0 (the predecessor of) avpriv_new_chapter() already
adds the title to the chapter's metadata. So setting it again in
matroskadec.c is redundant and expensive.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
mkvmerge versions 6.2 to 40.0 had a bug that made it not propagate the
WavPack extradata (containing the WavPack version) during remuxing from
a Matroska file; currently our demuxer would treat every WavPack block
encountered as invalid data (unless the WavPack stream is to be
discarded (i.e. the streams discard is >= AVDISCARD_ALL)) and try to
resync to the next level 1 element.
Luckily, the WavPack version is currently not really important; so we
fix this problem by assuming a version. David Bryant, the creator of
WavPack, recommended using version 0x410 (the most recent version) for
this. And this is what this commit does.
A FATE-test for this has been added.
Reviewed-by: David Bryant <david@wavpack.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
This will likely also fix CID 1452562, a false positive resulting from
Coverity thinking that av_dict_set() automatically frees its key and
value parameters (even without the AV_DICT_DONT_STRDUP_* flags).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
When a Matroska Block is only stored in compressed form, the size of
the uncompressed block is not explicitly coded and therefore not known
before decompressing it. Therefore the demuxer uses a guess for the
uncompressed size: The first guess is three times the compressed size
and if this is not enough, it is repeatedly incremented by a factor of
three. But when this happens with lzo, the decompression is neither
resumed nor started again. Instead when av_lzo1x_decode indicates that x
bytes of input data could not be decoded, because the output buffer is
already full, the first (not the last) x bytes of the input buffer are
resent for decoding in the next try; they overwrite already decoded
data.
This commit fixes this by instead restarting the decompression anew,
just with a bigger buffer.
This seems to be a regression since 935ec5a1.
A FATE-test for this has been added.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
ProRes in Matroska is supposed to not contain the first atom header
(containing a size field and the tag "icpf") and therefore the Matroska
demuxer has to recreate it; this involves an allocation and copy, of
course. Whether the old buffer (containing the data without the atom
header) needs to be freed or not depends upon whether it is what was
directly read (in which case it is owned by an AVBuffer) or whether it
has been allocated when reversing the track's content compression (e.g.
zlib compression) that Matroska supports.
So there are three pointers involved: The one pointing to the directly
read data (owned by the AVBuffer), the one pointing to the currently
valid data (which coincides with the former if no content compression
needed to be reverted) and the one pointing to the new data with the
first atom header. The check for whether to free the second of these is
simply whether the first two are different.
This works mostly, but there is a complication: Some muxers don't strip
the first atom header away and in this case, it is also not reinserted
and no new buffer is allocated; instead, the second and the third
pointers agree. In this case, one must never free the second buffer.
Yet it is currently done if the track is e.g. zlib compressed.
This commit fixes this.
This is a regression since b8e75a2a.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
When parsing EBML lacing, for every number read, a new AVIOContext has
been initialized (via ffio_init_context()) just for this number. This
has been changed: The context is kept now.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
When parsing the sizes of the frames in a lace fails, sometimes no
error message was raised (e.g. when using xiph or fixed-size lacing).
Only EBML lacing generated error messages (which were wrongly declared
as AV_LOG_INFO), but even here not all errors resulted in an error
message. So add a generic error message to catch them all.
Moreover, if parsing one of the EBML numbers fails, ebml_read_num already
emits its own error messages, so that all that is needed is a generic error
message to indicate that this happened during parsing the sizes of the
frames in a block; in other words, the error messages specific to
parsing EBML lace numbers can be and have been removed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
870e7552 introduced validating the lace sizes when they are parsed and
removed the old check; yet when merging this libav commit in 6902c3ac,
the old check for whether the frame extends beyond the frame has been kept.
It is unnecessary and has been removed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Up until now, when an error happened in one of the inner loops in
matroska_parse_laces, a variable designated for the return value has
been set to an error value and break has been used to exit the
current loop/case. This was done so that the end of matroska_parse_laces
is reached, because said function allocated memory which is later used
and freed in the calling function and passed at the end of
matroska_parse_laces.
But given that there is no allocation any more, one can now return
immediately. And this commit does this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
The maximal number of frames in a lace can be 256; hence one has a not
excessive upper bound on the size of an array that can hold the sizes of
all the frames in a lace. Yet up until now, said array has been
dynamically allocated. This has been changed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
It avoids the overhead of function calls.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Up until c4e0e314, the seek table has been included in the tta
extradata, so that the size of said extradata was 22 (the size of a TTA1
header) + 4 * number of frames. The decoder rejected anything below a
size of 30 and so the Matroska demuxer exported 30 byte long extradata,
of which only 18 were set (it ignores a CRC-32 and simply leaves it at
0). But this is unnecessary since said commit, so reduce the size to 22.
Furthermore, replace 30 by 22 in a comment about the extradata size in
libavcodec/tta.c.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
That way one doesn't have to free later. In this case (concerning TTA
extradata), this also fixes a memleak when the output samplerate is
invalid.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
The structure of a ProRes frame in mov/mp4 is that of a typical atom:
First a 32 bit BE size field, then a tag detailling the content. Said
size field includes the eight bytes of the atom header.
This header is actually redundant, as the size of the atom is already
known from the containing atom. It is therefore stripped away when muxed
into Matroska and so the Matroska demuxer has to recreate upon demuxing.
But it did not account for the fact that the size field includes the
size of the header and this can lead to problems when a decoder uses the
in-band size field.
Fixes ticket #8210.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
matroska_reset_status (a function that is used during seeking (among
other things)) used an int for the return value of avio_seek which
returns an int64_t. Checking the return value then indicated an error
even though the seek was successfull for targets in the range of
2GB-4GB, 6GB-8GB, ... This error implied that the status hasn't been
reset and in particular, the old level was still considered to be in
force, so that ebml_parse returned errors because the newly parsed
elements were of course not contained in the previously active and still
wrongly considered active master element any more.
Addresses ticket #8084.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
The typedef used to define EbmlSyntax already includes a const qualifier
so that it is unnecessary to include another const qualifier in future
definitions and declarations. Given that MSVC warns about this, this
commit removes these redundant const qualifiers.
Suggested-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Unknown-length elements end when an element not allowed in them, but
allowed at a higher level is encountered. In order to check for this,
c1abd95a added a pointer to every syntax level's parent to each
EbmlSyntax. Given that the parent must of course also reference the
child in order to be able to enter said child level, one needs to use
forward declarations.
These forward declarations constitute tentative definitions and tentative
definitions with internal linkage (like our syntaxes) must not be an
incomplete type. Yet they were an incomplete type and while GCC and
Clang did not even warn about this (on default warning levels), it
broke compilation with MSVC. Therefore this commit adds the sizes.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
If a file uses unknown-length level 1 elements besides clusters and such
elements are after the first cluster, then these elements will usually
be parsed twice: Once during parsing of the file header and once when
reading the file reaches the position where these elements are located.
The second time the element is parsed leads to a "Duplicate element"
error message. Known-length elements are not affected by this as they
are skipped except during parsing the header.
This commit fixes this by explicitly adding a check for whether the
position of the element to be parsed is the same as the position of the
already known level 1 element.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
This commit converts the MatroskaLevel1Element struct to use file-based
offsets, as opposed to the current practice of using offsets relative to
the beginning of the segment in it. This also includes a change from
uint64_t to int64_t.
This is in preparation to another patch that improves the check for
duplicate level 1 elements.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Up until now, one last kind of unknown-length element hasn't been
properly handled: Unknown-length elements that are supposed to be
skipped, i.e. the level 1 elements that might reside after the
clusters.
This commit changes this. To do this, ebml_parse got a mode that
essentially tries to skip everything except when parsing is needed
(namely for unknown-length elements for which parsing is necessary
as they can't be skipped). This mode is selected by using a NULL
as destination where the parsed data should be written to.
It is used to parse the level 1 elements in matroska_parse_cluster.
The syntax list used for parsing must of course include links to
the syntax of all the master elements that might need to be parsed.
In other words: Instead of matroska_clusters (which contained every
level 1 element except clusters as EBML_NONE elements designated to
be skipped) matroska_segment is needed and used; matroska_clusters has
been removed.
Furthermore, matroska_segment has been reordered so that clusters are at
the front as this is now the most common case for this list.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
matroska_probe did not support the case of an unknown-length EBML header
at all; given that libavformat's Matroska muxer used to produce such
files in the streaming case, support for them has been added.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The current Matroska specifications mandate that only two elements may
use an unknown-length length: Segments and clusters. But this was not
always so: For the greater part of Matroska's existence, all master
elements were allowed to make use of the unknown-length feature.
And there were muxers creating such files: For several years
libavformat's Matroska muxer used unknown-length for all master
elements when the output wasn't seekable. This only stopped in March
2010 with 2529bb30. And even afterwards it was possible (albeit
unlikely) for libavformat to create unknown-length master elements
that are in violation of today's specifications, namely if the master
element was so big that the seek backwards to update the size could
no longer be performed inside the AVIOContext's write buffer. This
has only been fixed in October 2016 (with the patches that introduced
support for writing CRC-32 elements).
Libavformat's Matroska demuxer meanwhile has never really supported
unknown-length elements besides segments and clusters. Support for the
latter was hardcoded. This commit changes this: Now all master elements
for which a syntax to parse them is available are supported. This
includes the files produced by old versions of libavformat's muxer.
More precisely, master elements that have unknown length and are about
to be parsed (not skipped) are supported; only a warning is emitted for
them. For normal files, this means that level 1 elements after the
clusters that are encountered after the clusters have been parsed (i.e.
not because they are referenced by the seekhead at the beginning of the
file) are still unsupported (they would be skipped at this point if
their length were known).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>