FFmpeg

Commit Graph

Author	SHA1	Message	Date
Andreas Rheinhardt	bdaa98dd4a	avformat/matroskadec: Don't output uninitialized data for RealAudio 28.8 The Matroska demuxer splits every sequence of h Matroska Blocks into h * w / cfs packets of size cfs; here h (sub_packet_h), w (frame_size) and cfs (coded_framesize) are parameters from the track's CodecPrivate. It does this by splitting the Block's data in h/2 pieces of size cfs each and putting them into a buffer at offset m * 2 * w + n * cfs where m (range 0..(h/2 - 1)) indicates the index of the current piece in the current Block and n (range 0..(h - 1)) is the index of the current Block in the current sequence of Blocks. The data in this buffer is then used for the output packets. The problem is that there is currently no check to actually guarantee that no uninitialized data will be output. One instance where this is trivially so is if h == 1; another is if cfs * h is so small that the input pieces do not cover everything that is output. In order to preclude this, rmdec.c checks for h * cfs == 2 * w and h >= 2. The former requirement certainly makes much sense, as it means that for every given m the input pieces (corresponding to the h different values of n) form a nonoverlapping partition of the two adjacent frames of size w corresponding to m. But precluding h == 1 is not enough, other odd values can cause problems, too. That is because the assumption behind the code is that h frames of size w contain data to be output, although the real number is h/2 * 2. E.g. for h = 3, cfs = 2 and w = 3 the current code would output four (== h * w / cfs) packets. although only data for three (== h/2 * h) packets has been read. (Notice that if h * cfs == 2 * w, h being even is equivalent to cfs dividing w; the latter condition also seems very reasonable: It means that the subframes are a partition of the frames.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	4f5c6c1b0e	avformat/matroskadec: Fix buffer overflow when demuxing RealAudio 28.8 RealAudio 28.8 (like other RealAudio codecs) uses a special demuxing mode in which the data of the existing Matroska Blocks is not simply forwarded as-is. Instead data from several Blocks is recombined together to output several packets. The parameters governing this process are parsed from the CodecPrivate: Coded framesize (cfs), frame size (w) and sub_packet_h (h). During demuxing, h/2 pieces of data of size cfs each are read from every Matroska (Simple)Block and put at offset m * 2 * w + n * cfs of a buffer of size h * w, where m ranges from 0 to h/2 - 1 for each Block while n is initially zero and incremented after a Block has been parsed until it is h, at which poin the assembled packets are output and n reset. The highest offset is given by (h/2 - 1) * 2 * w + (h - 1) * cfs + cfs while the destination buffer's size is given by h * w. For even h, this leads to a buffer overflow (and potential segfault) if h * cfs > 2 * w; for odd h, the condition is h * cfs > 3 * w. This commit adds a check to rule this out. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	c91e3690d9	avformat/matroskadec: Fix demuxing RealAudio 28.8 RealAudio 28.8 does not need or use sub_packet_size for its demuxing and this field is therefore commonly set to zero. But since `18ca491b` the Real Audio specific demuxing is no longer applied if sub_packet_size is zero because the codepath for cook and ATRAC3 divide by it; this made these files undecodable. Furthermore, since `569d18aa` (merged in `2c8d876d`) sub_packet_size being zero is used as an indicator for invalid data, so that a file containing such a track was completely skipped. This commit fixes this by not checking sub_packet_size for RealAudio 28.8 at all. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	c6f60b90f0	avformat/matroskadec: Simplify check for RealAudio They need a special parsing mode and in order to find out whether this mode is in use, several checks have to be performed. They can all be combined into one: If the buffer that is only used to assemble their packets has been allocated, use the RealAudio parsing mode. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	8287c20153	avformat/matroskadec: Reject sipr flavor > 3 Only flavors 0..3 seem to exist. E.g. rmdec.c treats any flavor > 3 as invalid data. Furthermore, we do not know how big the packets to create ought to be given that for sipr these values are not read from the bitstream, but from a table. Furthermore, flavor is only used for sipr, so only check it for sipr; rmdec.c does the same. (The old check for flavor being < 0 was always wrong given that flavor is an int that is read via avio_rb16(), so it has been removed completely.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	67e957b43a	avformat/matroska: Move mime_tag lists to matroskadec They are not used any more by the muxer. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	3059b7746a	avformat/matroskadec: Remove redundant setting of chapter titles Chapter titles are added to the chapter's metadata since `6cb6e159`, yet since `012867f0` (the predecessor of) avpriv_new_chapter() already adds the title to the chapter's metadata. So setting it again in matroskadec.c is redundant and expensive. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	048bc3fe31	avformat/matroskadec: Add a workaround for missing WavPack extradata mkvmerge versions 6.2 to 40.0 had a bug that made it not propagate the WavPack extradata (containing the WavPack version) during remuxing from a Matroska file; currently our demuxer would treat every WavPack block encountered as invalid data (unless the WavPack stream is to be discarded (i.e. the streams discard is >= AVDISCARD_ALL)) and try to resync to the next level 1 element. Luckily, the WavPack version is currently not really important; so we fix this problem by assuming a version. David Bryant, the creator of WavPack, recommended using version 0x410 (the most recent version) for this. And this is what this commit does. A FATE-test for this has been added. Reviewed-by: David Bryant <david@wavpack.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	ba36a07734	avformat/matroskadec: Don't discard the upper 32bits of TrackNumber Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Steve Lhomme	b5dd964cdc	avformat/matroskadec: fix the type of the TrackLanguage It's an ASCII string, not a UTF-8 string. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	40d9cbdc22	avformat/matroskadec: Use AV_DICT_DONT_STRDUP_VAL to save av_strdup This will likely also fix CID 1452562, a false positive resulting from Coverity thinking that av_dict_set() automatically frees its key and value parameters (even without the AV_DICT_DONT_STRDUP_* flags). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	5 years ago
Andreas Rheinhardt	2ff687c17f	avformat/matroskadec: Fix lzo decompression When a Matroska Block is only stored in compressed form, the size of the uncompressed block is not explicitly coded and therefore not known before decompressing it. Therefore the demuxer uses a guess for the uncompressed size: The first guess is three times the compressed size and if this is not enough, it is repeatedly incremented by a factor of three. But when this happens with lzo, the decompression is neither resumed nor started again. Instead when av_lzo1x_decode indicates that x bytes of input data could not be decoded, because the output buffer is already full, the first (not the last) x bytes of the input buffer are resent for decoding in the next try; they overwrite already decoded data. This commit fixes this by instead restarting the decompression anew, just with a bigger buffer. This seems to be a regression since `935ec5a1`. A FATE-test for this has been added. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
Andreas Rheinhardt	af50f0a515	avformat/matroskadec: Fix use-after-free when demuxing ProRes ProRes in Matroska is supposed to not contain the first atom header (containing a size field and the tag "icpf") and therefore the Matroska demuxer has to recreate it; this involves an allocation and copy, of course. Whether the old buffer (containing the data without the atom header) needs to be freed or not depends upon whether it is what was directly read (in which case it is owned by an AVBuffer) or whether it has been allocated when reversing the track's content compression (e.g. zlib compression) that Matroska supports. So there are three pointers involved: The one pointing to the directly read data (owned by the AVBuffer), the one pointing to the currently valid data (which coincides with the former if no content compression needed to be reverted) and the one pointing to the new data with the first atom header. The check for whether to free the second of these is simply whether the first two are different. This works mostly, but there is a complication: Some muxers don't strip the first atom header away and in this case, it is also not reinserted and no new buffer is allocated; instead, the second and the third pointers agree. In this case, one must never free the second buffer. Yet it is currently done if the track is e.g. zlib compressed. This commit fixes this. This is a regression since `b8e75a2a`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
Andreas Rheinhardt	d5274f86a8	avformat/matroskadec: Reuse AVIOContext When parsing EBML lacing, for every number read, a new AVIOContext has been initialized (via ffio_init_context()) just for this number. This has been changed: The context is kept now. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
Andreas Rheinhardt	dbe3be6744	avformat/matroskadec: Improve frame size parsing error messages When parsing the sizes of the frames in a lace fails, sometimes no error message was raised (e.g. when using xiph or fixed-size lacing). Only EBML lacing generated error messages (which were wrongly declared as AV_LOG_INFO), but even here not all errors resulted in an error message. So add a generic error message to catch them all. Moreover, if parsing one of the EBML numbers fails, ebml_read_num already emits its own error messages, so that all that is needed is a generic error message to indicate that this happened during parsing the sizes of the frames in a block; in other words, the error messages specific to parsing EBML lace numbers can be and have been removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
Andreas Rheinhardt	f74eaa17bb	avformat/matroskadec: Remove unnecessary check `870e7552` introduced validating the lace sizes when they are parsed and removed the old check; yet when merging this libav commit in `6902c3ac`, the old check for whether the frame extends beyond the frame has been kept. It is unnecessary and has been removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
Andreas Rheinhardt	a69f92a946	avformat/matroskadec: Simplify control flow of parsing laces Up until now, when an error happened in one of the inner loops in matroska_parse_laces, a variable designated for the return value has been set to an error value and break has been used to exit the current loop/case. This was done so that the end of matroska_parse_laces is reached, because said function allocated memory which is later used and freed in the calling function and passed at the end of matroska_parse_laces. But given that there is no allocation any more, one can now return immediately. And this commit does this. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
Andreas Rheinhardt	9ad1a6d64c	avformat/matroskadec: Avoid allocating array for lace sizes The maximal number of frames in a lace can be 256; hence one has a not excessive upper bound on the size of an array that can hold the sizes of all the frames in a lace. Yet up until now, said array has been dynamically allocated. This has been changed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
Andreas Rheinhardt	668490ac98	avformat/matroskadec: Use bytestream API instead of AVIOContext It avoids the overhead of function calls. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
Andreas Rheinhardt	eec26b5911	avformat/matroskadec: avcodec/tta: Set extradata_size to 22 Up until `c4e0e314`, the seek table has been included in the tta extradata, so that the size of said extradata was 22 (the size of a TTA1 header) + 4 * number of frames. The decoder rejected anything below a size of 30 and so the Matroska demuxer exported 30 byte long extradata, of which only 18 were set (it ignores a CRC-32 and simply leaves it at 0). But this is unnecessary since said commit, so reduce the size to 22. Furthermore, replace 30 by 22 in a comment about the extradata size in libavcodec/tta.c. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
Andreas Rheinhardt	f7bf59b431	avformat/matroskadec: Check before allocations That way one doesn't have to free later. In this case (concerning TTA extradata), this also fixes a memleak when the output samplerate is invalid. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
Andreas Rheinhardt	dbc50f8a93	avformat/matroskadec: Fix default value of BlockAddID Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
James Almer	3b4e9a31ea	avformat/matroskadec: don't rescale mastering display values Simplifies code. Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
Andreas Rheinhardt	581419ea39	avformat/matroskadec: Fix demuxing ProRes The structure of a ProRes frame in mov/mp4 is that of a typical atom: First a 32 bit BE size field, then a tag detailling the content. Said size field includes the eight bytes of the atom header. This header is actually redundant, as the size of the atom is already known from the containing atom. It is therefore stripped away when muxed into Matroska and so the Matroska demuxer has to recreate upon demuxing. But it did not account for the fact that the size field includes the size of the header and this can lead to problems when a decoder uses the in-band size field. Fixes ticket #8210. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
Michael Niedermayer	fccc37ca85	repeat an even number of characters in occured Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	5 years ago
James Almer	3b3150c45f	avformat/matroskadec: use av_fast_realloc to reallocate ebml list arrays Speeds up the process considerably. Fixes ticket #8109. Suggested-by: nevcairiel Suggested-by: cehoyos Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
James Almer	f34aabfbae	avformat/matroskadec: use proper types for some EbmlSyntax fields Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
Andreas Rheinhardt	c294f38c91	avformat/matroskadec: Fix seeking matroska_reset_status (a function that is used during seeking (among other things)) used an int for the return value of avio_seek which returns an int64_t. Checking the return value then indicated an error even though the seek was successfull for targets in the range of 2GB-4GB, 6GB-8GB, ... This error implied that the status hasn't been reset and in particular, the old level was still considered to be in force, so that ebml_parse returned errors because the newly parsed elements were of course not contained in the previously active and still wrongly considered active master element any more. Addresses ticket #8084. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	5 years ago
Andreas Rheinhardt	9869e21776	matroskadec: Remove redundant const The typedef used to define EbmlSyntax already includes a const qualifier so that it is unnecessary to include another const qualifier in future definitions and declarations. Given that MSVC warns about this, this commit removes these redundant const qualifiers. Suggested-by: Hendrik Leppkes <h.leppkes@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	ab4795a085	matroskadec: Add sizes to forward declarations Unknown-length elements end when an element not allowed in them, but allowed at a higher level is encountered. In order to check for this, `c1abd95a` added a pointer to every syntax level's parent to each EbmlSyntax. Given that the parent must of course also reference the child in order to be able to enter said child level, one needs to use forward declarations. These forward declarations constitute tentative definitions and tentative definitions with internal linkage (like our syntaxes) must not be an incomplete type. Yet they were an incomplete type and while GCC and Clang did not even warn about this (on default warning levels), it broke compilation with MSVC. Therefore this commit adds the sizes. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	806ac7da69	avformat/matroskadec: Improve check for level 1 duplicates If a file uses unknown-length level 1 elements besides clusters and such elements are after the first cluster, then these elements will usually be parsed twice: Once during parsing of the file header and once when reading the file reaches the position where these elements are located. The second time the element is parsed leads to a "Duplicate element" error message. Known-length elements are not affected by this as they are skipped except during parsing the header. This commit fixes this by explicitly adding a check for whether the position of the element to be parsed is the same as the position of the already known level 1 element. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	730ac1ae80	avformat/matroskadec: Use file offsets for level 1 elements This commit converts the MatroskaLevel1Element struct to use file-based offsets, as opposed to the current practice of using offsets relative to the beginning of the segment in it. This also includes a change from uint64_t to int64_t. This is in preparation to another patch that improves the check for duplicate level 1 elements. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	6854127a76	avformat/matroskadec: Reindent after previous commit Also use the smallest scope possible for a loop variable. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	71c908817e	avformat/matroskadec: Accept more unknown-length elements II Up until now, one last kind of unknown-length element hasn't been properly handled: Unknown-length elements that are supposed to be skipped, i.e. the level 1 elements that might reside after the clusters. This commit changes this. To do this, ebml_parse got a mode that essentially tries to skip everything except when parsing is needed (namely for unknown-length elements for which parsing is necessary as they can't be skipped). This mode is selected by using a NULL as destination where the parsed data should be written to. It is used to parse the level 1 elements in matroska_parse_cluster. The syntax list used for parsing must of course include links to the syntax of all the master elements that might need to be parsed. In other words: Instead of matroska_clusters (which contained every level 1 element except clusters as EBML_NONE elements designated to be skipped) matroska_segment is needed and used; matroska_clusters has been removed. Furthermore, matroska_segment has been reordered so that clusters are at the front as this is now the most common case for this list. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	9c6d14ab84	avformat/matroskadec: Fix probing of unknown-length headers matroska_probe did not support the case of an unknown-length EBML header at all; given that libavformat's Matroska muxer used to produce such files in the streaming case, support for them has been added. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	3c70b941d5	avformat/matroskadec: Accept more unknown-length elements The current Matroska specifications mandate that only two elements may use an unknown-length length: Segments and clusters. But this was not always so: For the greater part of Matroska's existence, all master elements were allowed to make use of the unknown-length feature. And there were muxers creating such files: For several years libavformat's Matroska muxer used unknown-length for all master elements when the output wasn't seekable. This only stopped in March 2010 with `2529bb30`. And even afterwards it was possible (albeit unlikely) for libavformat to create unknown-length master elements that are in violation of today's specifications, namely if the master element was so big that the seek backwards to update the size could no longer be performed inside the AVIOContext's write buffer. This has only been fixed in October 2016 (with the patches that introduced support for writing CRC-32 elements). Libavformat's Matroska demuxer meanwhile has never really supported unknown-length elements besides segments and clusters. Support for the latter was hardcoded. This commit changes this: Now all master elements for which a syntax to parse them is available are supported. This includes the files produced by old versions of libavformat's muxer. More precisely, master elements that have unknown length and are about to be parsed (not skipped) are supported; only a warning is emitted for them. For normal files, this means that level 1 elements after the clusters that are encountered after the clusters have been parsed (i.e. not because they are referenced by the seekhead at the beginning of the file) are still unsupported (they would be skipped at this point if their length were known). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	04b62bd7ce	avformat/matroskadec: Improve invalid length error handling 1. Up until now, the error message for EBML numbers whose length exceeds the limits imposed upon them because of the element's type did not distinguish between known-length and unknown-length elements. As a consequence, the numerical value of the define constant EBML_UNKNOWN_LENGTH was emitted as part of the error message which is of course not appropriate. This commit changes this by adding error messages designed for unknown-length elements. 2. We impose some (arbitrary) sanity checks on the lengths of certain element types; these checks were conducted before the checks depending on whether the element exceeds its containing master element. Now the order has been reversed, because a failure at the (formerly) latter check implies that the file is truly erroneous and not only fails our arbitrary length limit. Moreover, this increases the informativeness of the error messages. 3. Furthermore, the error message in general has been changed by replacing the type of the element (something internal to this demuxer and therefore suitable as debug output at best, not as an error message intended for ordinary users) with the element ID. The element's position has been added, too. 4. Finally, the length limit for EBML_NONE elements has been changed so that all unknown-length elements of EBML_NONE-type trigger an error. This is done because unknown-length elements can't be skipped and need to be parsed, but there is no syntax to parse available for EBML_NONE elements. This is done in preparation for a further patch which allows more unknown-length elements than just clusters and segments. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	5120305137	avformat/matroskadec: Don't skip too much when unseekable The Matroska (and WebM) file format achieves forward-compability by insisting that demuxers ignore and skip elements they don't know about. Unfortunately, this complicates the detection of errors as errors resulting from loosing sync can't be reliably distinguished from unknown elements that are part of a future version of the standard. Up until now, the strategy to deal with this situation was to skip all unknown elements that are not obviously erroneous; if an error happened, it was tried to seek to the last known good position to resync from (and resync to level 1 elements). This is working fine if the input is seekable, but if it is not, then the skipped data can usually not be rechecked lateron. This is particularly acute if unknown-length clusters are in use, as the check for whether a child element exceeds the containing master element is ineffective in this situation. To remedy this, a new heuristic has been introduced: If an unknown element is encountered in non-seekable mode, an error is presumed to have happened based upon a combination of the length of the row of the already encountered unknown elements and of how far away skipping this element would take us. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	60f75c9976	avformat/matroskadec: Typos, nits and cosmetics Cosmetics include reordering EbmlType so that EBML_SINT is adjacent to the other numbers (and matches the order in the switch in ebml_parse) and also reordering the switch for assignment of default values so that it matches the order in EbmlType. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	7087fc95b2	avformat/matroskadec: Reuse positions Up until now, avio_tell was used multiple times in ebml_parse and its subroutines, although the result of these calls can usually be simply derived from the result of earlier calls to avio_tell. This has been changed. Unnecessary calls to avio_tell in ebml_parse are avoided now. Furthermore, there has been a slight change in the output of some error messages relating to elements exceeding their containing master element: The reported position of the element now points to the first byte of the element ID and no longer to the first byte of the element's payload. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	3ed2755baa	avformat/matroskadec: Redo EOF handling This commit closes the last hole in the system of checks for a known-length file ending too early: Now an error message is emitted in case the file ends directly after an EBML element. Furthermore, this commit adds a check and a corresponding warning whether there is data beyond the Matroska segment (only reasonable for known-length segments). If everything looks alright, then parsing is stopped as soon as EOF is reached (in contrast, the earlier code would always call matroska_resync at the end). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	38255cdcf8	avformat/matroskadec: Combine arrays By including SimpleBlocks and BlockGroups twice in the same EbmlSyntax array (with different semantics), one can reduce the duplication of the other values. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	a9f051519e	avformat/matroskadec: Don't reset cluster position The new code does not rely on whether the cluster's position is set or not to infer whether a cluster needs to be closed or not (instead, this is done in ebml_parse), so there is no need to reset the cluster's position at all any more. It will be automatically set to the correct value when a cluster is entered. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	865c537007	avformat/matroskadec: Make cluster parsing level compatible Before this commit, the parsing of clusters mixed EBML levels by allowing elements from different levels in a EbmlSyntax (namely matroska_cluster_parsing). This has been changed. And the level is now explicitly used to determine how to parse. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	b31c9b72e5	avformat/matroskadec: Redo level handling This commit changes how levels are handled: If the level used for ebml_parse ends directly after an element that has been consumed, then ebml_parse ends the level itself (and any known-length levels that end there as well) and informs the caller via the return value; if the current level is of unknown-length, then the level is ended as soon as an element that is not valid on the current level, but on a higher level is encountered (or if EOF has been encountered). This is designed for situations where one wants to parse master elements incrementally, i.e. not in one go via ebml_parse_nest. The (incremental) parsing of clusters still mixes levels by using a syntax list that contains elements from different levels and the level is still ended manually via a call to ebml_level_end if the last cluster was an unknown-length cluster (known-length clusters are already ended when their last element is read), but only if the next element is a cluster, too. A different level 1 element following an unknown-length cluster will currently simply be presumed to be part of the earlier cluster. Fixing this will be done in a future patch. The modifications to matroska_parse_cluster contained in this patch are only intended not to cause regressions. Nevertheless, the fact that known-length levels are automatically ended in ebml_parse when their last element has been read already fixes a bogus error message introduced in `9326117b` that was emitted when a known-length cluster is followed by another level 1 element other than a cluster in which case the cluster's level was not ended (which only happened when a new cluster has been encountered) so that the length check (introduced in `9326117b`) failed for the level 1 element as it is of course not contained in the previous cluster. Most Matroska files were affected by this. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	c1abd95ad0	avformat/matroskadec: Link to parents in syntax tables By linking to the syntax of the parent (i.e. the containing master element) one can check whether an element is actually part of a higher level in the EBML hierarchy. Knowing this is important for unknown-length levels, because they end when an element that doesn't belong to this, but to a higher hierarchy level is encountered. Sometimes there are different syntaxes dealing with the same elements. In this case it is important to use a parent that contains all the elements at the parent level; whether this is the syntax actually used to enter the child's level is irrelevant. This affects the list of level 1 elements (which has been used as parent for matroska_cluster, too) and it affects recursive elements (currently only the SimpleTag), where the non-recursive parent has to be choosen. This is in preparation for a patch that redoes level handling. Finally, the segment id has been added to ebml_syntax. This will enable handling of unknown-length EBML headers. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	a3db9f62a4	avformat/matroskadec: Introduce a "last known good" position Currently, resyncing during reading packets works as follows: The current position is recorded, then a call to matroska_parse_cluster is made and if said call fails, the demuxer tries to resync from the earlier position. If the call doesn't fail, but also doesn't deliver a packet, then this is looped. There are two problems with this approach: 1. The Matroska file format aims to be forward-compatible; to achieve this, a demuxer should simply ignore and skip elements it doesn't know about. But it is not possible to reliably distinguish unknown elements from junk. If matroska_parse_cluster encounters an unknown element, it can therefore not simply error out; instead it returns zero and the loop is iterated which includes an update of the position that is intended to be used in case of errors, i.e. the element that is skipped is not searched for level 1 element ids to resync to at all if later calls to matroska_parse_cluster return an error. Notice that in case that sync has been lost there can be a chain of several unknown/possibly junk elements before an error is detected. 2. Even if a call to matroska_parse_cluster delivers a packet, this does not mean that everything is fine. E.g. it might be that some of the block's data is missing and that the data that was presumed to be from the block just read actually contains the beginning of the next element. This will only be apparent at the next call of matroska_read_packet, which uses the (false) end of the earlier block as resync position so that in the (not unlikely) case that the call to matroska_parse_cluster fails, the data believed to be part of the earlier block is not searched for a level 1 element to resync to. To counter this, a "last known good" position is introduced. When an element id that is known to be allowed at this position in the hierarchy (according to the syntax currently in use for parsing) is read and some further checks (regarding the length of the element and its containing master element) are passed, then the beginning of the current element is treated as a "good" position and recorded as such in the MatroskaDemuxContext. Because of 2., only the start of the element is treated as a "good" position, not the whole element. If an error occurs later during parsing of clusters, the resync process starts at the last known good position. Given that when the header is damaged the subsequent resync never skips over data and is therefore unaffected by both issues, the "last known good" concept is not used there. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	559e3422c7	avformat/matroskadec: Refactor some functions Since the changes to the parsing of SimpleBlocks, both ebml_parse_id and ebml_parse_elem are only called from one place, so that it is possible to inline these two function calls. This is done, but not completely: ebml_parse_id still exists in a modified form. This is done in preparation for a further patch regarding the handling of unknown-length elements. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	8a286e745d	avformat/matroskadec: Use proper levels after discontínuity The earlier code set the level to zero upon seeking and after a discontinuity although in both cases parsing (re)starts at a level 1 element. Also set the segment's length to unkown if an error occured in order not to drop any valid data that happens to be beyond the designated end of the segment. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago
Andreas Rheinhardt	310f326b43	avformat/matroskadec: Add function to reset status This function will be useful later to reset the status (e.g. current level and the already parsed id). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>	5 years ago

1 2 3 4 5 ...

783 Commits (bdaa98dd4aac08b8f23f959cbb5a80db2dacd14a)