This reduces the need for an edit list; streams that start with
e.g. dts=-1, pts=0 can be encoded as dts=0, pts=0 (which is valid
in mov/mp4) by shifting the dts values of all packets forward.
This avoids the need for edit lists for such streams (while they
still are needed for audio streams with encoder delay).
This eases conformance with the DASH-IF interoperability guidelines.
Signed-off-by: Martin Storsjö <martin@martin.st>
When writing a fragmented file, we by default write an index pointing
to all the fragments at the end of the file. This causes constantly
increasing memory usage during the muxing. For live streams, the
index might not be useful at all.
A similar fragment index is written (but at the start of the file) if
the global_sidx flag is set. If ism_lookahead is set, we need to keep
data about the last ism_lookahead+1 fragments.
If no fragment index is to be written, we don't need to store information
about all fragments, avoiding increasing the memory consumption
linearly with the muxing runtime.
This fixes out of memory situations with long live mp4 streams.
Signed-off-by: Martin Storsjö <martin@martin.st>
As long as caller only writes packets using av_interleaved_write_frame
with no manual flushing, this should allow us to always have accurate
durations at the end of fragments, since there should be at least
one queued packet in each stream (except for the stream where the
current packet is being written, but if the muxer itself does the
cutting of fragments, it also has info about the next packet for that
stream).
Signed-off-by: Martin Storsjö <martin@martin.st>
Currently, AVStream contains an embedded AVCodecContext instance, which
is used by demuxers to export stream parameters to the caller and by
muxers to receive stream parameters from the caller. It is also used
internally as the codec context that is passed to parsers.
In addition, it is also widely used by the callers as the decoding (when
demuxer) or encoding (when muxing) context, though this has been
officially discouraged since Libav 11.
There are multiple important problems with this approach:
- the fields in AVCodecContext are in general one of
* stream parameters
* codec options
* codec state
However, it's not clear which ones are which. It is consequently
unclear which fields are a demuxer allowed to set or a muxer allowed to
read. This leads to erratic behaviour depending on whether decoding or
encoding is being performed or not (and whether it uses the AVStream
embedded codec context).
- various synchronization issues arising from the fact that the same
context is used by several different APIs (muxers/demuxers,
parsers, bitstream filters and encoders/decoders) simultaneously, with
there being no clear rules for who can modify what and the different
processes being typically delayed with respect to each other.
- avformat_find_stream_info() making it necessary to support opening
and closing a single codec context multiple times, thus
complicating the semantics of freeing various allocated objects in the
codec context.
Those problems are resolved by replacing the AVStream embedded codec
context with a newly added AVCodecParameters instance, which stores only
the stream parameters exported by the demuxers or read by the muxers.
The double meaning of the faststart flag (moving a moov atom
to the start of files, making them streamable, for non-fragmented
files, vs inserting a global sidx index at the start of files
for fragmented files) is confusing - see 40ed1cbf1 for
explanation of its origins.
Since the second meaning of the flag hasn't been part of any
libav release yet, just rename it to get rid of the confusion
without any extra deprecation (which wouldn't get rid of the
potential confusion, of users adding -movflags faststart
even for fragmented files, where it isn't needed for making
them "streamable").
This gets back the old behaviour, where -movflags faststart
doesn't have any effect for fragmented files.
Signed-off-by: Martin Storsjö <martin@martin.st>
The same field is also used for writing the sidx index header,
for fragmented files, when the faststart flag is used.
Signed-off-by: Martin Storsjö <martin@martin.st>
For strict CFR, they should be pretty much equal, but if the stream
is VFR, there can be a sometimes significant difference.
Calculate the pts duration separately, used in sidx atoms and for
tfrf/tfxd boxes in smooth streaming ismv files.
Also make sure to reduce the duration of sidx entries according to
edit lists.
Signed-off-by: Martin Storsjö <martin@martin.st>
Even if this is a guess, it is way better than writing a zero duration
of the last sample in a fragment (because if the duration is zero,
the first sample of the next fragment will have the same timestamp
as the last sample in the previous one).
Since we normally don't require libavformat muxer users to set
the duration field in AVPacket, we probably can't strictly require
it here either, so don't log this as a strict warning, only as info.
Signed-off-by: Martin Storsjö <martin@martin.st>
This is incompatible with the omit_tfhd_offset flag (writing
position independent fragments with interleaving requires the
default_base_moof flag).
This makes the moof atoms slightly bigger, but can be better for
playback (improving locality of sample data in the mdat).
Signed-off-by: Martin Storsjö <martin@martin.st>
This way, the caller doesn't need to coordinate setting the option
after the moov atom has been written. The downside is that it is
no longer possible to use the option for checking whether the moov
atom already has been written, but a caller is able to keep track
of that by other means anyway.
Signed-off-by: Martin Storsjö <martin@martin.st>
The previous use of the mov->fragments field, for determining whether
written packets were part of the first fragment or not, didn't
work as intended when using the empty_moov flag.
Signed-off-by: Martin Storsjö <martin@martin.st>
Use the more generic approach with the delay_moov flag, instead of
having a update mechanism specific to this one single atom.
Signed-off-by: Martin Storsjö <martin@martin.st>
This delays writing the moov until the first fragment is written,
or can be flushed by the caller explicitly when wanted. If the first
sample in all streams is available at this point, we can write
a proper editlist at this point, allowing streams to start at
something else than dts=0. For AC3 and DNXHD, a packet is
needed in order to write the moov header properly.
This isn't added to the normal behaviour for empty_moov, since
the behaviour that ftyp+moov is written during avformat_write_header
would be changed. Callers that split the output stream into header+segments
(either by flushing manually, with the custom_frag flag set, or by
just differentiating between data written during avformat_write_header
and the rest) will need to be adjusted to take this option into use.
For handling streams that start at something else than dts=0, an
alternative would be to use different kinds of heuristics for
guessing the start dts (using AVCodecContext delay or has_b_frames
together with the frame rate), but this is not reliable and doesn't
necessarily work well with stream copy, and wouldn't work for getting
the right initialization data for AC3 or DNXHD either.
Signed-off-by: Martin Storsjö <martin@martin.st>
The pts and the corresponding duration is written in sidx
atoms, thus make sure these match up correctly.
Signed-off-by: Martin Storsjö <martin@martin.st>
This allows creating a later mp4 fragment without sequentially
writing the earlier ones before (when called from a segmenter).
Normally when writing a fragmented mp4 file sequentially, the
first timestamps of a fragment are adjusted to match the
end of the previous fragment, to make sure the timestamp is the
same, even if it is calculated as the sum of previous fragment
durations. (And for the first packet in a file, the offset of
the first packet is written using an edit list.)
When writing an individual mp4 fragment discontinuously like this
(with potentially writing the earlier fragments separately later),
there's a risk of getting a gap in the timeline if the duration
field of the last packet in the previous fragment doesn't match up
with the start time of the next fragment.
Using this requires setting -avoid_negative_ts make_non_negative
(or -avoid_negative_ts 0).
Signed-off-by: Martin Storsjö <martin@martin.st>
This is mapped to the faststart flag (which in this case
perhaps should be called "shift and write index at the
start of the file"), which for fragmented files will
write a sidx index at the start.
When segmenting DASH into files, there's usually one sidx
at the start of each segment (although it's not clear to me
whether that actually is necessary). When storing all of it
in one file, the MPD doesn't necessarily need to describe
the individual segments, but the offsets of the fragments can be
fetched from one large sidx atom at the start of the file. This
allows creating files for the DASH ISO BMFF on-demand profile.
Signed-off-by: Martin Storsjö <martin@martin.st>
A flag "dash" is added, which enables the necessary flags for
creating DASH compatible fragments.
When this is enabled, one sidx atom is written for each track
before every moof atom.
Signed-off-by: Martin Storsjö <martin@martin.st>
By calling this after writing the moof the first time (for
calculating the moof size), we can avoid intermediate storage
of tfrf_offset in MOVTrack.
Signed-off-by: Martin Storsjö <martin@martin.st>
In this case, shift tracks to start from zero instead (potentially
stretching the first sample in tracks that start later than the
first one).
Some software does not support edit lists at all, the adobe flash
player seems to be one of these. This results in AV sync errors when
edit lists are used to adjust AV sync.
Some players, such as QuickTime, don't respect the duration for
audio packets, so if an audio track starts later than the video
track and the first audio sample gets a duration longer than the
actual amount of data in it, the result will be out of sync.
Based on patches by Michael Niedermayer.
Signed-off-by: Martin Storsjö <martin@martin.st>
Similarly to the omit_tfhd_offset flag added in e7bf085b, this
avoids writing absolute byte positions to the file, making them
more easily streamable.
This is a new feature from 14496-12:2012, so application support
isn't necessarily too widespread yet (support for it in libav was
added in 20f95f21f in July 2014).
Signed-off-by: Martin Storsjö <martin@martin.st>
If one track doesn't have any samples within a moof, no traf/trun
is written for it. When the omit_tfhd_offset flag is set, none
of the tfhd atoms have any base_data_offset set, and the implicit
offset (end of previous track fragment data, or start of the moof
for the first trun) is used.
Signed-off-by: Martin Storsjö <martin@martin.st>
F4V is Adobe's mp4/iso media variant, with the most significant
addition/change being supporting other flash codecs than just
aac/h264.
Signed-off-by: Martin Storsjö <martin@martin.st>
This makes the output fragments independent of their position in
the output stream, making the output work better when streamed.
QuickTime Player doesn't support fragmented mp4 without the base
data offset, though.
Signed-off-by: Martin Storsjö <martin@martin.st>
This is a bit more work, but avoids having to fill in
the data offset field afterwards instead of directly when
the rest of the trun atom is written.
This simplifies future cases where this field needs to be set to
something different.
Signed-off-by: Martin Storsjö <martin@martin.st>
QuickTime will play multiple audio tracks concurrently if this flag is
set for multiple audio tracks. And if no subtitle track has this flag
set, QuickTime will show no subtitles in the subtitle menu.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Faststart moves the moov atom to the beginning of the file and rewrites
the rest of the file after muxing is complete.
Signed-off-by: Martin Storsjö <martin@martin.st>
The previous allocation increment of 16384 meant that the cluster
array was allocated for 0.6 MB initially, which is a bit excessive
for cases with fragmentation where only a fraction of that ever
actually is used.
Therefore, start off at a much smaller value, and increase by
doubling (to avoid reallocating too often when writing long
non-fragmented mp4 files).
Bug-Id: 525
Signed-off-by: Martin Storsjö <martin@martin.st>
When writing fragmented mp4, the cluster array is reset when a
fragment is written. Instead of starting off reallocating the
array only based on the number of current elements in it, keep
track of how many elements there were allocated earlier.
This avoids reallocating this array needlessly when writing
fragmented mp4 files.
Bug-Id: 525
Signed-off-by: Martin Storsjö <martin@martin.st>
This makes the struct name (which isn't used anywhere) match the
name of the typedef, as for all the other structs declared in this
header.
Signed-off-by: Martin Storsjö <martin@martin.st>
The other fragmentation options (frag_duration, frag_size and
frag_keyframe) are combined with OR, cutting fragments at the
first of the conditions being fulfilled.
Signed-off-by: Martin Storsjö <martin@martin.st>
This allows writing QuickTime-compatible fragmented mp4 (with
a non-empty moov atom) to a non-seekable output.
This buffers the mdat for the initial fragment just as it does
for all normal fragments, too. Previously, the resulting
atom structure was mdat,moov, moof,mdat ..., while it now
is moov,mdat, moof,mdat.
Signed-off-by: Martin Storsjö <martin@martin.st>
This fixes calculation of trackDuration if the MOVIentry array
is cleared. This is required by the fragmentation support in the
next patch.
Signed-off-by: Martin Storsjö <martin@martin.st>
Originally, sizeof(struct MOVIentry) was 48, after the reordering,
it is 40 in my build configuration.
When writing really long mov/mp4 files, this can make a difference
- this saves a bit over 2 MB of memory per hour of video (down to
10.3 MB per hour from 12.3 MB per hour initially) for a video with
75 packets per second - 25 fps + 50 audio packets (which is the
case for AMR audio).
Signed-off-by: Martin Storsjö <martin@martin.st>
If an annex b bitstream is muxed into mov, the actual written
sample is reformatted to mp4 syntax before writing.
Currently, the RTP hints that copy data from the normal video
track, where the payload data might be offset compared to the
original sample that the RTP hinting used (when 3 byte
annex b startcodes have been converted into 4 byte mp4 format
startcodes).
Signed-off-by: Martin Storsjö <martin@martin.st>