Requires a new upstream function to test not for *import* support on a
given output pixel format, but also whether we can render to it.
Fixes: https://github.com/haasn/libplacebo/issues/173
That feature is overkill for a constant pointer to AVFilterLink which
can be stored in AVCodecContext.opaque (indirectly, because the link is
not allocated yet at the time the codec is opened).
This also avoids leaking non-NULL AVFrame.opaque to callers.
Add an optional filter_line3 to the available optimisations.
filter_line3 is equivalent to filter_line, memcpy, filter_line
filter_line shares quite a number of loads and some calculations in
common with its next iteration and testing shows that using aarch64
neon filter_line3s performance is 30% better than two filter_lines
and a memcpy.
Adds a test for vf_bwdif filter_line3 to checkasm
Rounds job start lines down to a multiple of 4. This means that if
filter_line3 exists then filter_line will not sometimes be called
once at the end of a slice depending on thread count. The final slice
may do up to 3 extra lines but filter_edge is faster than filter_line
so it is unlikely to create any noticable thread load variation.
Signed-off-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Martin Storsjö <martin@martin.st>
Exports C filter_line needed for tail fixup of neon code
Adds neon for filter_line
Signed-off-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Martin Storsjö <martin@martin.st>
Adds clip and spatial macros for aarch64 neon
Exports C filter_edge needed for tail fixup of neon code
Adds neon for filter_edge
Signed-off-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Martin Storsjö <martin@martin.st>
Adds an outline for aarch neon functions
Adds common macros and consts for aarch64 neon
Exports C filter_intra needed for tail fixup of neon code
Adds neon for filter_intra
Signed-off-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Martin Storsjö <martin@martin.st>
The HDR metadata should be removed after HDR to SDR conversion,
otherwise the output frame still has HDR side data.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
This makes the filter output match that of the C version.
It was left intentionally while we figured out if it was better
or not, and while it makes certain samples better, it makes static
samples jump around slightly.
The discrepancy between the definition and the declaration
in allfilters.c is actually UB.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The old logic was trying to be excessively clever in "deducing" that the
user wanted to stretch/scale the image when ow/oh differed from iw/ih
aspect ratio. But this is almost surely unintended except in
pathological cases, and in those cases users should simply disable
normalize_sar and do all the stretching/scaling logic themselves. This
is especially important in multi-input mode, where the canvas may be
vastly different from the input dimensions of any stream. Also, passing
through input 0 SAR in multi-input mode is arbitrary and nearly useless,
so again force output SAR to 1:1 here.
Use the gcd of all input timebases to ensure PTS accuracy. For the
framerate, just pick the highest of all the inputs, under the assumption
that we will render frames with approximately this frequency. Of course,
this is not 100% accurate, in particular if the input frames are badly
misaligned. But this field is informational to begin with.
Importantly, it covers the "common" case of combining high FPS and low
FPS streams with aligned frames.
In the event that some frame mixes are OK while others are not, the
priority goes:
1. Errors in updating any frame -> return error
2. Any input incomplete -> request frames and return
3. Any inputs OK -> ignore EOF streams and render remaining inputs
4. No inputs OK -> set output to most recent status
This logic ensures that we can continue rendering the remaining streams,
no matter which streams reach their end of life, until we have no
streams left at which point we forward the last EOF.
When combining multiple inputs, the output PTS may be less than the PTS
of the input. In this case, the current's code assumption of always
draining one value from the FIFO is incorrect. Replace by a smarter
function which drains only those PTS values that were actually consumed.
When combining multiple inputs with different PTS and durations, in
input-timed mode, we emit one output frame for every input frame PTS,
from *any* input. So when combining a low FPS stream with a high FPS
stream, the output framerate would match the higher FPS, independent of
which order they are specified in.