I tried doing this before, but it resulted in weird behaviour with
certain samples. I want to say I think I've got it sorted out now,
and the new autobsf stuff makes it trivial to turn on.
The native support for packed bframes is buggy and I think buggy
in ways beyond what I already try to account for, so this should be
a net improvements.
Previously, the pts value was initialised to AV_NOPTS_VALUE and so
it was not necessary to always set it. Now, with the new-new decode
API, this is no longer true. I'm not sure why I avoided setting the
pts when the decoder value was also AV_NOPTS_VALUE - it clearly
wouldn't have changed anything previously, but here we are.
Failing to do this, means the frame pts will be some random uninitalised
value.
The new new decode API requires the decoder to ask for the next input
packet, and it cannot just return EAGAIN if that packet cannot be
processed yet. This means we must finally confront how we get this
decoder to block when the input buffer is full and no output frames
are ready yet.
In the end, that isn't too hard to achieve - the main trick seems to
be that you have to aggressively poll the hardware - it doesn't seem
to make any forward progress if you sleep.
Signed-off-by: James Almer <jamrial@gmail.com>
The old code had to retain a partial frame across two calls in
the case of separate interlaced fields. Now, we know that we'll
get both fields within the same receive_frame call, and so we
don't need to manage the frame as private state any more.
It's not possible to return EAGAIN when we've passed input EOF and are
in draining mode. If do return EAGAIN, we're saying there's no way to
get any more output - which isn't true in many cases.
So let's handled these cases in an internal loop as best we can.
It seems that without all the other 1:1 heuristics, we don't have
a fundamental problem trusting the interlaced flag on output
pictures. That's a relief.
I'm not sure why, but the mpeg4_unpack_bframes bsf is not
interacting well with seeking. Looking at the code, it should be
ok, with possibly one warning shown, but I see it getting stuck
for an extended period of time after a seek where a packed frame
is cached to be shown later.
So, I gave up on that and went back to making the old hardware
based path work. Turns out that it wasn't broken except that some
samples have a 6 byte drop packet which I wasn't accounting for.
Now it works again and seeks are good.
The new decode API allows for m:n decode patterns, which is what
you need to use this hardware in a sane way. There are so many
situations where 1:1 doesn't happen naturally that it's a miracle
I got it working as well as I did.
With this change, we can throw all of the crazy heuristics and
sleeps(!) out, and things work correctly.
Why on earth the hardware returns garbage for the first sample of
a decoded picture is anyone's guess. The simplest reasonable way
to patch it up is to copy the first sample of the second line. This
should result in the correct chroma values (because the data was
original 4:2:0 upsampled to 4:2:2) even if the luma is isn't.
The hardware handling of packed bframes was always questionable but
it used to ok with my workaround. Today, not so much. But today we
have a bsf to unpack the bframes, so let's just use that and be
done with it.
With all the various refactorings that have happened over the years,
the current pts logic is very broken for non-trivial cases (ie: ones
where not every frame/field has a meaningful pts assocated with it).
Generally, we do not want to write AV_NOPTS_VALUE as the output
timestamp, regardless of anything else. It's better to pass zero
if there's no other information.
Additionally, interlaced content where the decoder returns each field
separately can result in the first field carrying the timestamp and
the second having AV_NOPTS_VALUE. It's clearly wrong to overwrite
the valid timestamp.
So, let's just never write AV_NOPTS_VALUE into an output frame.
Empirically, this fixed playback of interlaced mpeg2 and h.264 and
mpeg4-asp with packed b-frames in an avi container.
Although the old API is supposed to be functional, the crystalhd
decoder is currently not working for non-annex.b h.264 content.
So, let's update to the modern API and make it work again.
Signed-off-by: Philip Langdale <philipl@overt.org>
Istvan Sebok provided a sample where field pair -> two fields content
was being misdetected by the existing logic. I added an additional
test to check the input picture type as identified by our h.264
parser.
Signed-off-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
With the flag in place, it's hard to actually use the decoder, and
I'm happy with how it works, with the exception of DivX3 where I've
never found a sample that worked that I was confident actually
matched what the hardware claimed to support.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This was a regression that came in when I switched to using the
h.264 annex b filter all the time. As the filter modifies extradata,
its use violates the statelessness assumption that exists in the
'ffmpeg' command line tool, and maybe elsewhere. It assumes that
a docoder can be reinitalised and pointed to an existing stream and
get the same results.
For now, the only way to meet this requirement is to backup the
extradata.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Now that we're converting all streams to Annex B format, we
can identify them as such to the hardware.
Signed-off-by: Philip Langdale <philipl@overt.org>
As we're now always running mp4 format streams through the annex b
filter, it makes sense to pass the filtered stream down, as
libcrystalhd would be doing the conversion internally anyway.
Signed-off-by: Philip Langdale <philipl@overt.org>
Originally, we needed to restore the original extradata after
initialising the mp4toannexb filter because mplayer would end up
taking two passes through the init sequence for the same stream
and end up miscategorising the stream. This doesn't seem to happen
anymore, making the backup/restore process unnecessary.
Signed-off-by: Philip Langdale <philipl@overt.org>
The H.264 parser that we use to detect interlacing can only handle
an Annex B stream, so we need to actually use the filter. This is
unfortunate as the crystalhd library is already doing this conversion
internally. A future change will reorganise the decode path more
completely so that we can feed the converted stream into libcrystalhd
and avoid the second conversion.
Signed-off-by: Philip Langdale <philipl@overt.org>
In preparation for using the filter on the actual bitstream, we need
to extend it's lifetime to match that of the decoder.
Signed-off-by: Philip Langdale <philipl@overt.org>
I still don't fully understand the cause but the difference between
the samples that trigger the bug and the samples that don't is
that the former uses delay frames and the later uses drop frames
as placeholders for the packed frame. So, if we see the one type
of frame, we can assume the bug will or won't be present.
Right now, I'm detecting the frame types by size, which may not be
safe in general, but given the specific codec and file type, I
expect any scenario where we encounter these frames where they
aren't being used for b-frame packing won't care one way or
another whether the work around is in effect or not.
Signed-off-by: Philip Langdale <philipl@overt.org>
The CrystalHD hardware can do scaling, which is particularly
desirable when dealing with some high resolution clips that take
so long to decode and copy out that they end up playing back
slower than realtime. By using scaling, we can make the output
frames smaller and reduce the copy out time.
This option takes the desired horizontal width in pixels, and
the hardware will do an aspect-scale. Upscaling is not supported
and the hardware will simply ignore any request to do so.
Signed-off-by: Philip Langdale <philipl@overt.org>
I was using the wrong value to track the position of the parser in the
stream. For an error-free stream, the size of the frame and number of
bytes consumed will be the same, but in an error situation they can
diverge.
Signed-off-by: Philip Langdale <philipl@overt.org>
As previously discussed, the CrystalHD hardware returns exceptionally
useless information about interlaced h.264 content - to the extent
that it's not possible to distinguish most MBAFF and PAFF content until
it's too late.
In an attempt to compensate for this, I'm introducing two mechanisms:
1) Peeking at the picture number of the next picture
The hardware provides a capability to peek the next picture number. If
it is the same as the current picture number, then we are clearly dealing
with two fields and not a frame or fieldpair.
If this always worked, it would be all we need, but it's not guaranteed
to work. Sometimes, the next picture may not be decoded sufficiently
for the number to be known; alternately, a corruption in the stream may
cause the hardware to refuse to return the number even if the next
intact frame is decoded. In either case, the query will return 0.
If we are unable to peek the next picture number, we assume that the
picture is a frame/fieldpair and return it accordingly. If that turns
out to be incorrect, we discard the second field, and the user has
to live with the glitch. In testing, false detection can occur for
the first couple of seconds, and then the pipeline stabalizes and
we get correct detection.
2) Use the h264_parser to detect when individual input fields have
been combined into an output fieldpair.
I have multiple PAFF samples where this behaviour is detected. The
peeking mechanism described above will correctly detect that the
output is a fieldpair, but we need to know what the input type was
to ensure pipeline stability (only return one output frame per input
frame).
If we find ourselves with an output fieldpair, yet the input picture
type was a field, as reported by the parser, then we are dealing with
this case, and can make sure not to return anything on the next
decode() call.
Taken together, these allow us to remove the hard-coded hacks for
different h.264 types, and we can clearly describe the conditions
under which we can trust the hardware's claim that content is
interlaced.
Signed-off-by: Philip Langdale <philipl@overt.org>
Now that we know the type of the input picture, we have to bring
that information to the output picture to help identify its type.
We do this by adding a field to the opaque_list node.
Signed-off-by: Philip Langdale <philipl@overt.org>
As the hardware is unreliable, we will have to use the h.264 parser
to identify whether an input picture is a field or a frame. This
change loads the parser and extracts the picture type.
Signed-off-by: Philip Langdale <philipl@overt.org>
In preparation for adding additional fields to the node, return
the node instead of the pts value. This requires the caller to
free the node.
Signed-off-by: Philip Langdale <philipl@overt.org>