This was tested with medias recorded from an iPhone XR and an iPhone 13.
Here is how a typical stream looks like in coding order:
┌────────┬─────┬─────┬──────────┐
│ sample | PTS | DTS | keyframe |
├────────┼─────┼─────┼──────────┤
┊ ┊ ┊ ┊ ┊
│ 53 │ 560 │ 510 │ No │
│ 54 │ 540 │ 520 │ No │
│ 55 │ 530 │ 530 │ No │
│ 56 │ 550 │ 540 │ No │
│ 57 │ 600 │ 550 │ Yes │
│ * 58 │ 580 │ 560 │ No │
│ * 59 │ 570 │ 570 │ No │
│ * 60 │ 590 │ 580 │ No │
│ 61 │ 640 │ 590 │ No │
│ 62 │ 620 │ 600 │ No │
┊ ┊ ┊ ┊ ┊
In composition/display order:
┌────────┬─────┬─────┬──────────┐
│ sample | PTS | DTS | keyframe |
├────────┼─────┼─────┼──────────┤
┊ ┊ ┊ ┊ ┊
│ 55 │ 530 │ 530 │ No │
│ 54 │ 540 │ 520 │ No │
│ 56 │ 550 │ 540 │ No │
│ 53 │ 560 │ 510 │ No │
│ * 59 │ 570 │ 570 │ No │
│ * 58 │ 580 │ 560 │ No │
│ * 60 │ 590 │ 580 │ No │
│ 57 │ 600 │ 550 │ Yes │
│ 63 │ 610 │ 610 │ No │
│ 62 │ 620 │ 600 │ No │
┊ ┊ ┊ ┊ ┊
Sample/frame 58, 59 and 60 are B-frames which actually depends on the
key frame (57). Here the key frame is not an IDR but a "CRA" (Clean
Random Access).
Initially, I thought I could rely on the sdtp box (independent and
disposable samples), but unfortunately:
sdtp[54] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[55] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[56] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[57] is_leading:0 sample_depends_on:2 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[58] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[59] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[60] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[61] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[62] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
The information that might have been useful here would have been
is_leading, but all the samples are set to 0 so this was unusable.
Instead, we need to rely on sgpd/sbgp tables. In my case the video track
contained 3 sgpd tables with the following grouping types: tscl, sync
and tsas. In the sync table we have the following 2 entries (only):
sgpd.sync[1]: sync nal_unit_type:0x14
sgpd.sync[2]: sync nal_unit_type:0x15
(The count starts at 1 because 0 carries the undefined semantic, we'll
see that later in the reference table).
The NAL unit types presented here correspond to:
libavcodec/hevc.h: HEVC_NAL_IDR_N_LP = 20,
libavcodec/hevc.h: HEVC_NAL_CRA_NUT = 21,
In parallel, the sbgp sync table contains the following:
┌────┬───────┬─────┐
│ id │ count │ gdi │
├────┼───────┼─────┤
│ 0 │ 1 │ 1 │
│ 1 │ 56 │ 0 │
│ 2 │ 1 │ 2 │
│ 3 │ 59 │ 0 │
│ 4 │ 1 │ 2 │
│ 5 │ 59 │ 0 │
│ 6 │ 1 │ 2 │
│ 7 │ 59 │ 0 │
│ 8 │ 1 │ 2 │
│ 9 │ 59 │ 0 │
│ 10 │ 1 │ 2 │
│ 11 │ 11 │ 0 │
└────┴───────┴─────┘
The gdi column (group description index) directly refers to the index in
the sgpd.sync table. This means the first frame is an IDR, then we have
batches of undefined frames interlaced with CRA frames. No IDR ever
appears again (tried on a 30+ seconds sample).
With that information, we can build an heuristic using the presentation
order.
A few things needed to be introduced in this commit:
1. min_sample_duration is extracted from the stts: we need the minimal
step between sample in order to PTS-step backward to a valid point
2. In order to avoid a loop over the ctts table systematically during a
seek, we build an expanded list of sample offsets which will be used
to translate from DTS to PTS
3. An open_key_samples index to keep track of all the non-IDR key
frames; for now it only supports HEVC CRA frames. We should probably
add BLA frames as well, but I don't have any sample so I prefered to
leave that for later
It is entirely possible I missed something obvious in my approach, but I
couldn't come up with a better solution. Also, as mentioned in the diff,
we could optimize is_open_key_sample(), but the linear scaling overhead
should be fine for now since it only happens in seek events.
Fixing this issue prevents sending broken packets to the decoder. With
FFmpeg hevc decoder the frames are skipped, with VideoToolbox the frames
are glitching.
sgpd means Sample Group Description Box.
For now, only the sync grouping type is parsed, but the function can
easily be adjusted to support other flavours.
The sbgp (Sample to Group Box) sync_group table built in previous commit
contains references to this table through the group_description_index
field.
Very high stts sample deltas may occasionally be intended but usually
they are written in error or used to store a negative value for dts correction
when treated as signed 32-bit integers.
This option lets the user set an upper limit, beyond which the delta is clamped to 1.
Values greater than the limit if negative when cast to int32 are used to adjust onward dts.
Unit is the track time scale. Default is UINT_MAX - 48000*10 which
allows upto a 10 second dts correction for 48 kHz audio streams while
accommodating 99.9% of uint32 range.
Signed-off-by: Gyan Doshi <ffmpeg@gyani.pro>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
As per 8.6.1.2.2 of ISO/IEC 14496-12:2015(E), STTS sample offsets
are to be always stored as uint32_t. So far, they have been signed ints
which led to desync in files with very large offsets.
The MOVStts struct was used to store CTTS offsets as well. These can be
negative in version 1. So a new struct MOVCtts was created and all
declarations for CTTS usage changed to MOVCtts.
correct implementation of 'cenc' encryption scheme to support
decryption of partial cipher blocks at the end of subsamples
https://www.iso.org/standard/68042.html
Signed-off-by: Nachiket Tarate <nachiket.programmer@gmail.com>
Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
This information is coded in a standard MP4 KindBox and utilizes the
scheme and values as per the DASH role scheme defined in MPEG-DASH.
Other schemes are technically allowed, but where multiple schemes
define the same concepts, the DASH scheme should be utilized.
Such flagging is additionally utilized by the DASH-IF CMAF ingest
specification, enabling an encoder to inform the following component
of the roles of the incoming media streams.
A test is added for this functionality in a similar manner to the
matroska test.
Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
Includes basic support for both the ISMV ('dfxp') and MP4 ('stpp')
methods. This initial version also foregoes fragmentation support
in case the built-in sample squashing is to be utilized, as this
eases the initial review.
Additionally, add basic tests for both muxing modes in MP4.
Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
The AAXC container format is the same as the (already supported) Audible
AAX format but it uses a different encryption scheme.
Note: audible_key and audible_iv values are variable (per file) and are
externally fed.
It is possible to extend https://github.com/mkb79/Audible to derive the
audible_key and audible_key values.
Relevant code:
def decrypt_voucher(deviceSerialNumber, customerId, deviceType, asin, voucher):
buf = (deviceType + deviceSerialNumber + customerId + asin).encode("ascii")
digest = hashlib.sha256(buf).digest()
key = digest[0:16]
iv = digest[16:]
# decrypt "voucher" using AES in CBC mode with no padding
cipher = AES.new(key, AES.MODE_CBC, iv)
plaintext = cipher.decrypt(voucher).rstrip(b"\x00") # improve this!
return json.loads(plaintext)
The decrypted "voucher" has the required audible_key and audible_iv
values.
Update (Nov-2020): This patch has now been tested by multiple folks -
details at the following URL:
https://github.com/mkb79/Audible/issues/3
Signed-off-by: Vesselin Bontchev <vesselin.bontchev@yandex.com>
On files with more than one sidx box, like live fragmented MP4
files, it was previously re-reading and seeking on every singl
sidx box, leading to extremely poor performance on larger files,
especially over the network.
Only do it on the first one, and stash its result.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
dts would start over at the beginning of each trun when they should be
computed contiguously for each trun in a traf
Fixes ticket 8070
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Allows the creation of the sdtp atom while remuxing MP4 to MP4. This
atom is required by Apple devices (iPhone, Apple TV) in order to accept
2160p medias.
Detecting missing tfhd avoids re-using tfhd track info from the previous
moof. For files with multiple tracks, this may make a mess of the
avindex and fragindex, which can later trigger av_assert0 in
mov_read_trun().
Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
ISOBMFF does not allow AudioSampleEntryV1 in stsd version 0, so
assume the descriptor format is QTFF SoundDescriptionV1. ISOBMFF does
not define a version 2.
This fixes audio decoding for some MP4 files generated with Apple
tools. The additional fields present in SoundDescriptionV1/V2 need to
be read in order to correctly read additional boxes that contain
information required for decoding the stream.
Fixes#7376.
Also see: https://github.com/HandBrake/HandBrake/issues/1555
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Fixes a compilation warning if size_t != uint64_t:
libavformat/mov.c: In function ‘mov_read_saio’:
libavformat/mov.c:6207:45: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
encryption_index->auxiliary_offsets = auxiliary_offsets;
^
This doesn't support saio atoms with more than one offset.
Signed-off-by: Jacob Trimble <modmaker@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
- Parse schm atom to get different encryption schemes.
- Allow senc atom to appear in track fragments.
- Allow 16-byte IVs.
- Allow constant IVs (specified in tenc).
- Allow only tenc to specify encryption (i.e. no senc/saiz/saio).
- Use sample descriptor to detect clear fragments.
This doesn't support:
- Different sample descriptor holding different encryption info.
- Only first sample descriptor can be encrypted.
- Encrypted sample groups (i.e. seig).
- Non-'cenc' encryption scheme when using -decryption_key.
Signed-off-by: Jacob Trimble <modmaker@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes seek for files with empty edits and files with negative ctts
(dts_shift > 0). Added fate samples and tests.
Signed-off-by: Sasi Inguva <isasi@isasi.mtv.corp.google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
When keyframe intervals of dash segments are not perfectly aligned,
fragments in the stream can overlap in time. The previous sorting by
timestamp causes packets to be read out of decode order and results
in decode errors.
Insert new "trun" index entries into index_entries in the order that
the trun are referenced by the sidx.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
When sidx box support is enabled, the code will skip reading all
trun boxes (each containing ctts entries for samples inthat box).
If seeks are attempted before all ctts values are known, the old
code would dump ctts entries into the wrong location. These are
then used to compute pts values which leads to out of order and
incorrectly timestamped packets.
This patch fixes ctts processing by always using the index returned
by av_add_index_entry() as the ctts_data index. When the index gains
new entries old values are reshuffled as appropriate.
This approach makes sense since the mov demuxer is already relying
on the mapping of AVIndex entries to samples for correct demuxing.
As a result of this all ctts entries are now 1-count. A followup
change will be submitted to remove support for > 1 count entries
which will simplify seeking.
Notes for future improvement:
Probably there are other boxes (stts, stsc, etc) that are impacted
by this issue... this patch only attempts to fix ctts since it
completely breaks packet timestamping.
This patch continues using an array for the ctts data, which is not
the most ideal given the rearrangement that needs to happen (via
memmove as new entries are read in). Ideally AVIndex and the ctts
data would be set-type structures so addition is always worst case
O(lg(n)) instead of the O(n^2) that exists now; this slowdown is
noticeable during seeks.
Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Adding an MOV format option to turn on/off the editlist supporting code, introduced in ca6cae73db
Signed-off-by: Sasi Inguva <isasi@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Retain the ranges of frame indexes when applying edit list in
mov_fix_index. The index ranges are then used to keep track of the frame
index of the current sample. In case of a discontinuity in frame indexes
due to edit, update the auxiliary info position accordingly.
Reviewed-by: Sasi Inguva <isasi@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This implements Spherical Video V1 and V2, as described in the
spatial-media collection by Google.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
This implements Spherical Video V1 and V2, as described in the
spatial-media collection by Google.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
This matrix needs to be applied after all others have (currently only
display matrix from trak), but cannot be handled in movie box, since
streams are not allocated yet. So store it in main context, and apply
it when appropriate, that is after parsing the tkhd one.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
According to spec ISO_IEC_15444_12 "For any media stream for which no segment index is present, referred to as non‐indexed stream, the media stream associated with the first Segment Index box in the segment serves as a reference stream in a sense that it also describes the subsegments for any non‐indexed media stream."
Signed-off-by: Sasi Inguva <isasi@google.com>
Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This matrix needs to be applied after all others have (currently only
display matrix from trak), but cannot be handled in movie box, since
streams are not allocated yet. So store it in main context, and apply
it when appropriate, that is after parsing the tkhd one.
Fate tests are updated accordingly.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
3 parts:
- Supports multiple chapter streams
- Exports regular text chapter streams as opaque data. This prevents consumers
from showing chapters as if they were regular subtitle streams.
- Exports video chapter streams as thumbnails, and provides the first one as
an attached_pic.
This breaks files with legitimate single-entry edit lists,
and the hack, introduced in f03a081df0,
has no link to any known sample in its commit message.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
This breaks files with legitimate single-entry edit lists,
and the hack, introduced in f03a081df0,
has no link to any known sample in its commit message.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Store data from each stsd in a separate extradata buffer, keep track of
the stsc index for read and seek operations, switch buffers when the
index differs. Decoder is notified with an AV_PKT_DATA_NEW_EXTRADATA
packet side data.
Since H264 supports this notification, and can be reset midstream, enable
this feature only for multiple avcC's. All other stsd types (such as
hvc1 and hev1) need decoder-side changes, so they are left disabled for
now.
This is implemented only in non-fragmented MOVs.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
This feature is mostly only used by NLE software, and is
both of dubious value being enabled by default, and a
possible security risk.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
This feature is mostly only used by NLE software, and is
both of dubious value being enabled by default, and a
possible security risk.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
support reading encrypted mp4 using aes-ctr, conforming to ISO/IEC
23001-7.
a new parameter was added:
- decryption_key - 128 bit decryption key (hex)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
QuickTime metadata can come after trak data. Add indicator for which trak is being parsed (-1 if none) so that global metadata after the trak can be parsed.
Signed-off-by: Neil Birkbeck <neil.birkbeck@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Chapter-indexing can be expensive since chapters may be interspersed
throughout the entire file and may require many seeks - especially
costly when consuming a video over a remote protocol like http.
Furthermore it is often unnecessary, especially when only trying to get
video info (e.g. via ffprobe).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The Apple dev specification:
https://developer.apple.com/library/mac/documentation/QuickTime/QTFF/Metadata/Metadata.html
Basically the structure is like:
|--meta
|----hdlr
|----keys
|----ilst
1) The handler type in the metadata handler atom is ‘mdta’.
2) The key and value are stored separately for each key-value pair.
The 'keys' atom stores the key table, while 'ilst' atom stores the
values corresponding to the indices in the key table.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>