The calling code does not handle failures and will fail with assertion failures later.
Seeking can always fail even when the position was previously read.
Fixes: Assertion failure
Fixes: 35253/clusterfuzz-testcase-minimized-ffmpeg_dem_MATROSKA_fuzzer-4693059982983168
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Do this by allocating AVStream together with the data that is
currently in AVStreamInternal; or rather: Put AVStream at the
beginning of a new structure called FFStream (which encompasses
more than just the internal fields and is a proper context in its own
right, hence the name) and remove AVStreamInternal altogether.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Do this by allocating AVFormatContext together with the data that is
currently in AVFormatInternal; or rather: Put AVFormatContext at the
beginning of a new structure called FFFormatContext (which encompasses
more than just the internal fields and is a proper context in its own
right, hence the name) and remove AVFormatInternal altogether.
The biggest simplifications occured in avformat_alloc_context(), where
one can now simply call avformat_free_context() in case of errors.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The WebM DASH Manifest demuxer creates a comma-delimited list of
all the timestamps of index entries. It allocates 20 bytes per
timestamp; yet the largest 64bit numbers have 20 decimal digits
(for int64_t it can be '-'+ 19 digits), so that one needs 21B
per entry because of the comma (resp. the final NUL).
The code uses snprintf, but snprintf returns the strlen of the string
that would have been written had the supplied buffer been big enough.
And if this is 21, then the next entry is written at an offset of 21
from the current position. So if enough such entries exist, the buffer
won't suffice.
This commit fixes this by replacing the allocation of buffer for
the supposedly worst-case with dynamic allocations by using an AVBPrint.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Currently AVIOContext's private fields are all over AVIOContext.
This commit moves them into a new structure in avio_internal.h instead.
Said structure contains the public AVIOContext as its first element
in order to avoid having to allocate a separate AVIOContextInternal
which is costly for those use cases where one just wants to access
an already existing buffer via the AVIOContext-API.
For these cases ffio_init_context() can't fail and always returned zero,
which was typically not checked. Therefore it has been made to not
return anything.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself
Fixes: 33997/clusterfuzz-testcase-minimized-ffmpeg_dem_WEBM_DASH_MANIFEST_fuzzer-6752039691485184
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This is possible now that the next-API is gone.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
All instances of adding attached pictures to a stream or adding
a stream and an attached packet to said stream have several things
in common like setting the index and flags of the packet, setting
the stream disposition etc. This commit therefore factors this out.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Before 8d78e90a6b the Matroska demuxer
used stack packets to hold temporary packets; now it uses a temporary
packet allocated by the Matroska demuxer. Yet because it used stack
packets the code has always properly reset the packet on error, while
on success these temporary packets were put into a packet list via
avpriv_packet_list_put(), which already resets the source packet.
This means that this code is compatible with just reusing
AVFormatInternal.parse_pkt (which is unused while one is in the
demuxer's read_packet() function). Compared to before 8d78e90a6
this no longer wastes one initialization per AVPacket read
(the resetting of the stack packet performed by av_packet_move_ref()
in avpriv_packet_list_put() was for naught).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The next pointer is kept at the end for backwards compatability until the
major bump, when it should ideally be moved at the front.
Signed-off-by: James Almer <jamrial@gmail.com>
Needs a CountedElement in order to distinguish the case of the element
not being present and the element being present with a value of zero.
(It has been argued by Ridley Combs that one should only ever use the
AV_DISPOSITION_DUB field for audio tracks. Yet given that there is no
definition for the disposition flags, one can also interpret it to mean
that e.g. a subtitle track is meant to be used with the dubbed audio
track or the original audio track. This commit interprets this flag in
this sense, which also allows to maintain it on remuxing.)
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Reviewed-by: Ridley Combs <rcombs@rcombs.me>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
This is the equivalent of the WebM "D_WEBVTT/DESCRIPTIONS" and is
therefore only exported for subtitles.
Reviewed-by: Ridley Combs <rcombs@rcombs.me>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Given that our disposition flags provide no way to distinguish the
cases of "track is unsuitable for hearing impaired users" and "it is
unknown whether the track is suitable for hearing impaired users" we do
not need to use a CountedElement for these flags.
Reviewed-by: Ridley Combs <rcombs@rcombs.me>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Hint: Matroska actually provides a way to distinguish the cases of
"track is no commentary track" and "it is unknown whether the track
is a commentary track", but our disposition flags do not. Therefore
we need not use a CountedElement.
Reviewed-by: Ridley Combs <rcombs@rcombs.me>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
For a very long time, the payload of integer and float elements had to
have a length > 0. Our parser treated such invalid elements as having a
value zero. But now it has been defined what an EBML element with length
zero means: It is a shorthand for the default value. This has also been
defined for strings (both ASCII and UTF-8). This commit modifies our
parser to support this.
Reviewed-by: Ridley Combs <rcombs@rcombs.me>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
This has been done in order to find out whether this element is present
at all; but this can now be done in a cleaner way by using a CountedElement
for it.
Reviewed-by: Ridley Combs <rcombs@rcombs.me>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
In the absence of an explicitly coded minimal luminance, the current
code inferred it to be -1, an invalid value. Yet it did not check the
value lateron at all, so that if a valid maximum luminance is
encountered, but no minimal luminance, an invalid minimal luminance of
-1 is exported. If an minimal luminance element with a negative value is
present, it is exported, too. This can be simply fixed by adding a check
for the value of the element.
Yet given that a minimal luminance of zero Cd/m² is legal and can be
coded with a length of zero, we must not use a fake default value to
find out whether the element is present or not. Therefore this patch
uses an explicit counter for it.
While just at it, also check for max_luminance > min_luminance.
Reviewed-by: Ridley Combs <rcombs@rcombs.me>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Up until now, the generic EBML reader used by the Matroska demuxer did
not have the capability to record whether an element was actually
present or not; instead, in cases where it mattered one typically added
an invalid default value and checked whether the value is valid (in
which case it is guaranteed to be present). This worked pretty well so
far, yet the EBML specifications have evolved: It is now legal to use
zero-length elements for floats, ints, uints and strings (both ASCII and
UTF-8); the value of these elements is the default value of the element
(if it has one) or zero for scalar types and an empty string for
strings. Furthermore, having a default value does no longer imply that
the element may be presumed to be present (with its default value) if it
is absent; this is only true if the element is mandatory, too.
These rules are designed to allow size savings as follows: Consider the
newly added FlagOriginal: It being zero means the track is not in its
original language, it being one means it is. For backward compatibility
reasons, neither of the two values may be inferred automatically in the
absence of the element. But one can still save a byte when one wants to
write the element with a value of zero, as one can write the integer with
a length of zero: 0x55AE 80 instead of 0x55AE 81 00. In the former case,
a parser has to infer the value of the element to be zero (which is the
element's default value).
When encountering an element with length zero, our parser always infers
a value of zero (or an empty string); this is wrong for values with
a different default value. It needs to infer the default value (or zero
in its absence) and this precludes using an invalid default value for
elements like FlagOriginal. Ergo one needs to be able to record whether
an element is present or not by other means. This patch allows to use a
simple counter for this. While just at it, some invalid and unnecessary
default values have been removed (mastering metadata elements used
default values of -1.0, despite these elements only being used if they
are > 0).
Reviewed-by: Ridley Combs <rcombs@rcombs.me>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The Block timestamp read in matroska_parse_block() is in track timebase and is
passed on as such to the AVPacket which uses this timebase.
In the normal case the Cluster and Track timebases are the same because the
track->time_scale is 1.0. But when it is not the case, the values in Cluster
timebase need to be transformed in Track timebase so they can be added
together.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Those are private fields, no reason to have them exposed in a public
header. Since there are some (semi-)public fields located after these,
even though this section is supposed to be private, keep some dummy
padding there until the next major bump to preserve ABI compatibility.
And replace the flags parameter with a function callback that can be used to
copy the contents of the packet (e.g, av_packet_ref and av_packet_copy_props).
Signed-off-by: James Almer <jamrial@gmail.com>
The Matroska demuxer currently always opens a GetByteContext to read the
content of the projection's private data buffer; it does this even if
there is no private data buffer in which case opening the GetByteContext
will lead to a NULL + 0 which is undefined behaviour.
Furthermore, in this case the code relied both on the implicit checks
of the bytestream2 API as well as on the fact that it returns zero
if there is not enough data available.
Both of these issues have been addressed by not using the bytestream API
any more; instead the data is simply read directly by using AV_RB. This
is possible because the offsets are constants.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
In certain error scenarios, the underlying Matroska demuxer was not
properly closed, causing leaks.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
When demuxing a Matroska/WebM file, streams are added for tracks and for
attachments, so that the array containing the former can be NULL even
when the corresponding AVFormatContext has streams. So check for there
to be tracks in the MatroskaDemuxContext instead of just streams in the
AVFormatContext before dereferencing the pointer to the tracks.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
matroska_parse_block currently asserts that the duration is not equal to
AV_NOPTS_VALUE, but there is nothing that actually guarantees this. It
is easy to create (spec-compliant) files which run into this assert;
so replace it and instead cap the duration to INT64_MAX, as the duration
field of an AVPacket is an int64_t.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
EBML binary elements are already made reference-counted when read;
so when populating the AVStream.attached_pic, one does not need to
allocate a new buffer for the data; instead the current code just
creates a new reference to the underlying AVBuffer. But this can be
improved even further: Just move the already existing reference.
This also fixes a memleak that happens upon error because
matroska_read_close has not been called in this scenario.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Each AttachedFile in Matroska can have a FileDescription element that
contains a human-friendly name for the attached file; yet this element
has been ignored up until now. This commit changes this and exports it
as title tag instead (the Matroska muxer mapped the title tag to the
AttachedFile element since support for Attachments was added).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The Matroska specification allows multiple (level 1) Tags elements per
file, yet our demuxer didn't: While it parsed any amount of Tags
elements it found in front of the Clusters (albeit with warnings because
of duplicate elements), it would treat any Tags element only referenced
via a SeekHead entry as already parsed if any Tags element has already
been parsed; therefore this Tags element would not be parsed at all.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
There can be more than one SeekHead in a Matroska file, but most of the
other level 1 elements can only occur once.* Therefore the Matroska
demuxer only allows one entry per ID in its internal list of level 1
elements known to it; the only exception to this are SeekHeads.
The only exception to this are SeekHeads: When one is encountered
(either directly or in the list of entries read from SeekHeads),
a new entry in the list of known level-1 elements is always added,
even when this entry is actually already known.
This leads to lots of seeks in case of circular SeekHeads: Each time a
SeekHead is parsed, a new entry for a SeekHead will be added to the list
of entries read from SeekHeads. The exception for SeekHeads mentioned
above now implies that this SeekHead will always appear new and unparsed
and parsing will be attempted. This continued until the list of known
level-1 elements is full.
Fixing this is pretty simple: Don't add a new entry for a SeekHead if
its position matches the position of an already known SeekHead.
*: Actually, there can be multiple Tags and several other level 1
elements are "identically recurring" which means they may be resent
multiple times, but each instance must be absolutely identical to the
previous.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
A Seek element in a Matroska SeekHead should contain a SeekID and a
SeekPosition element and upon reading, they should be sanitized:
Given that IDs are restricted to 32 bit, longer SeekIDs should be treated
as invalid. Instead currently the lower 32 bits have been used.
For SeekPosition, no checks were performed for the element to be
present and if present, whether it was excessively large (i.e. the
absolute file position described by it exceeding INT64_MAX). The
SeekPosition element had a default value of -1 which means that a check
seems to have been intended; but it was not implemented. This commit adds
a check for overflow to the calculation of the absolute file position of
the referenced level 1 elements.
Using -1 (i.e. UINT64_MAX) as default value for SeekPosition implies that
a Seek element without SeekPosition will run afoul of this check.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>