This was tested with medias recorded from an iPhone XR and an iPhone 13.
Here is how a typical stream looks like in coding order:
┌────────┬─────┬─────┬──────────┐
│ sample | PTS | DTS | keyframe |
├────────┼─────┼─────┼──────────┤
┊ ┊ ┊ ┊ ┊
│ 53 │ 560 │ 510 │ No │
│ 54 │ 540 │ 520 │ No │
│ 55 │ 530 │ 530 │ No │
│ 56 │ 550 │ 540 │ No │
│ 57 │ 600 │ 550 │ Yes │
│ * 58 │ 580 │ 560 │ No │
│ * 59 │ 570 │ 570 │ No │
│ * 60 │ 590 │ 580 │ No │
│ 61 │ 640 │ 590 │ No │
│ 62 │ 620 │ 600 │ No │
┊ ┊ ┊ ┊ ┊
In composition/display order:
┌────────┬─────┬─────┬──────────┐
│ sample | PTS | DTS | keyframe |
├────────┼─────┼─────┼──────────┤
┊ ┊ ┊ ┊ ┊
│ 55 │ 530 │ 530 │ No │
│ 54 │ 540 │ 520 │ No │
│ 56 │ 550 │ 540 │ No │
│ 53 │ 560 │ 510 │ No │
│ * 59 │ 570 │ 570 │ No │
│ * 58 │ 580 │ 560 │ No │
│ * 60 │ 590 │ 580 │ No │
│ 57 │ 600 │ 550 │ Yes │
│ 63 │ 610 │ 610 │ No │
│ 62 │ 620 │ 600 │ No │
┊ ┊ ┊ ┊ ┊
Sample/frame 58, 59 and 60 are B-frames which actually depends on the
key frame (57). Here the key frame is not an IDR but a "CRA" (Clean
Random Access).
Initially, I thought I could rely on the sdtp box (independent and
disposable samples), but unfortunately:
sdtp[54] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[55] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[56] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[57] is_leading:0 sample_depends_on:2 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[58] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[59] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[60] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[61] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[62] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
The information that might have been useful here would have been
is_leading, but all the samples are set to 0 so this was unusable.
Instead, we need to rely on sgpd/sbgp tables. In my case the video track
contained 3 sgpd tables with the following grouping types: tscl, sync
and tsas. In the sync table we have the following 2 entries (only):
sgpd.sync[1]: sync nal_unit_type:0x14
sgpd.sync[2]: sync nal_unit_type:0x15
(The count starts at 1 because 0 carries the undefined semantic, we'll
see that later in the reference table).
The NAL unit types presented here correspond to:
libavcodec/hevc.h: HEVC_NAL_IDR_N_LP = 20,
libavcodec/hevc.h: HEVC_NAL_CRA_NUT = 21,
In parallel, the sbgp sync table contains the following:
┌────┬───────┬─────┐
│ id │ count │ gdi │
├────┼───────┼─────┤
│ 0 │ 1 │ 1 │
│ 1 │ 56 │ 0 │
│ 2 │ 1 │ 2 │
│ 3 │ 59 │ 0 │
│ 4 │ 1 │ 2 │
│ 5 │ 59 │ 0 │
│ 6 │ 1 │ 2 │
│ 7 │ 59 │ 0 │
│ 8 │ 1 │ 2 │
│ 9 │ 59 │ 0 │
│ 10 │ 1 │ 2 │
│ 11 │ 11 │ 0 │
└────┴───────┴─────┘
The gdi column (group description index) directly refers to the index in
the sgpd.sync table. This means the first frame is an IDR, then we have
batches of undefined frames interlaced with CRA frames. No IDR ever
appears again (tried on a 30+ seconds sample).
With that information, we can build an heuristic using the presentation
order.
A few things needed to be introduced in this commit:
1. min_sample_duration is extracted from the stts: we need the minimal
step between sample in order to PTS-step backward to a valid point
2. In order to avoid a loop over the ctts table systematically during a
seek, we build an expanded list of sample offsets which will be used
to translate from DTS to PTS
3. An open_key_samples index to keep track of all the non-IDR key
frames; for now it only supports HEVC CRA frames. We should probably
add BLA frames as well, but I don't have any sample so I prefered to
leave that for later
It is entirely possible I missed something obvious in my approach, but I
couldn't come up with a better solution. Also, as mentioned in the diff,
we could optimize is_open_key_sample(), but the linear scaling overhead
should be fine for now since it only happens in seek events.
Fixing this issue prevents sending broken packets to the decoder. With
FFmpeg hevc decoder the frames are skipped, with VideoToolbox the frames
are glitching.
sgpd means Sample Group Description Box.
For now, only the sync grouping type is parsed, but the function can
easily be adjusted to support other flavours.
The sbgp (Sample to Group Box) sync_group table built in previous commit
contains references to this table through the group_description_index
field.
It appears this is not allowed "Each Segment Index box documents how a (sub)segment is divided into one or more subsegments
(which may themselves be further subdivided using Segment Index boxes)."
Fixes: Null pointer dereference
Fixes: Ticket9517
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -9223372036854775808 - 8 cannot be represented in type 'long'
Fixes: 43542/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-5237670148702208
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
MOVAtom.type is always read as a little-endian number
(despite MOV/ISOBMFF being big-endian).
Fixes the matroska-dovi-write-config8 FATE-test on big-endian
arches (which runs into the "index out of range" warning message).
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is small (16 B) and therefore the overhead of exporting it more
than outweighs the size savings from not having duplicated symbols:
When the symbol is no longer avpriv, one saves twice the size of
the string containing the symbols name (2x30 byte), two entries
in .dynsym (24 bytes each on x64), one entry in the importing libraries
.got and .rela.dyn (8 + 24 bytes on x64) and two entries for the
symbol version (2 bytes each) and one hash value in the exporting
library (4 bytes).
(The exact numbers are of course different for other platforms
(e.g. when using dlls), but given that the strings saved alone
more than outweigh the array size it can be presumed that this
is beneficial for all platforms.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
To avoid duplicating code. The implementation in dovi_isom is identical.
Signed-off-by: quietvoid <tcChlisop0@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Very high stts sample deltas may occasionally be intended but usually
they are written in error or used to store a negative value for dts correction
when treated as signed 32-bit integers.
This option lets the user set an upper limit, beyond which the delta is clamped to 1.
Values greater than the limit if negative when cast to int32 are used to adjust onward dts.
Unit is the track time scale. Default is UINT_MAX - 48000*10 which
allows upto a 10 second dts correction for 48 kHz audio streams while
accommodating 99.9% of uint32 range.
Signed-off-by: Gyan Doshi <ffmpeg@gyani.pro>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 9223372036200463215 + 1109914409 cannot be represented in type 'long'
Fixes: 41480/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-6553086177443840
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -776522110086937600 * 16 cannot be represented in type 'long'
Fixes: 40563/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-6644829447127040
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
As per 8.6.1.2.2 of ISO/IEC 14496-12:2015(E), STTS sample offsets
are to be always stored as uint32_t. So far, they have been signed ints
which led to desync in files with very large offsets.
The MOVStts struct was used to store CTTS offsets as well. These can be
negative in version 1. So a new struct MOVCtts was created and all
declarations for CTTS usage changed to MOVCtts.
bit_rate is not a critical field, and we shouln't hard fail if we
can't caluclate it due to a large timebase - it needlessly breaks
valid files.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
correct implementation of 'cenc' encryption scheme to support
decryption of partial cipher blocks at the end of subsamples
https://www.iso.org/standard/68042.html
Signed-off-by: Nachiket Tarate <nachiket.programmer@gmail.com>
Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
This information is coded in a standard MP4 KindBox and utilizes the
scheme and values as per the DASH role scheme defined in MPEG-DASH.
Other schemes are technically allowed, but where multiple schemes
define the same concepts, the DASH scheme should be utilized.
Such flagging is additionally utilized by the DASH-IF CMAF ingest
specification, enabling an encoder to inform the following component
of the roles of the incoming media streams.
A test is added for this functionality in a similar manner to the
matroska test.
Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
Fixes: signed integer overflow: 9223372036854775360 + 536870912 cannot be represented in type 'long'
Fixes: 37940/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-6095637855207424
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Do this by allocating AVStream together with the data that is
currently in AVStreamInternal; or rather: Put AVStream at the
beginning of a new structure called FFStream (which encompasses
more than just the internal fields and is a proper context in its own
right, hence the name) and remove AVStreamInternal altogether.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: signed integer overflow: 9223372034248226491 + 3275247799 cannot be represented in type 'long'
Fixes: clusterfuzz-testcase-minimized-audio_decoder_fuzzer-4538729166077952
Reported-by: Matt Wolenetz <wolenetz@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Currently AVIOContext's private fields are all over AVIOContext.
This commit moves them into a new structure in avio_internal.h instead.
Said structure contains the public AVIOContext as its first element
in order to avoid having to allocate a separate AVIOContextInternal
which is costly for those use cases where one just wants to access
an already existing buffer via the AVIOContext-API.
For these cases ffio_init_context() can't fail and always returned zero,
which was typically not checked. Therefore it has been made to not
return anything.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The MOV muxer can store streamids as track ids but they aren't
visible when probing the result via lavf/dump or ffprobe due to
lack of this flag in the demuxer.
9888ffb1ce added checks for EOF
in loops in the mov demuxer as a precaution against timeouts;
yet there is no I/O in the loop when parsing the STSZ atom
as the values are read from an already read buffer. So remove said
checks.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
mov_read_stsz() did not ensure that every bit of a buffer is addressable
by an int as is required by the get_bits API, leading to a crash in
ticket #9344. Fix this by restricting the size more thoroughly.
The file from said ticket will then be considered invalid; in the
future, we might read and process the data in chunks to actually support
such files.
Fixes ticket #9344.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: signed integer overflow: 9223372036854775807 + 1442840321 cannot be represented in type 'long'
Fixes: 33670/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-6644379491106816
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 8511838621821575200 - -3954125146725285889 cannot be represented in type 'long'
Fixes: 33414/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-6610119325515776
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
By default, a demuxer's read_close function is not called automatically
if an error happens when reading the header; instead it is up to the
demuxer to clean up after itself in this case. The mov demuxer did this
by calling its read_close function when it encountered some errors when
reading the header.
This commit changes this by setting the FF_FMT_INIT_CLEANUP flag so that
mov_read_close() is automatically called when an error happens when
reading the header.
(Btw: mov_read_close() is not idempotent: Calling it twice is
dangerouos, because MOVContext.frag_index.item will be av_freep'ed,
yet MOVContext.frag_index.nb_items won't be reset. So the calls to
mov_read_close() have to be removed before the switch to freeing
generically.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Libavcodec can now handle the AV1CodecConfigurationRecord structure
as-is when passed as extradata, so the standard behavior of
read-box-into-extradata should suffice, just like with AVC and HEVC.
This is possible now that the next-API is gone.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>