Right now all AVCodecContexts except those using frame-threaded decoding
call the codec's init function and expect its close function to be
called. In order to make sure that the close function is not called for
frame-threaded decoding ff_frame_thread_free() resets
AVCodecContext.codec (and because of this it has to free the private
AVOptions of the main AVCodecContext itself). This is not obvious and
potentially fragile. Instead add a field to AVCodecInternal that
indicates whether close should be called for this AVCodecContext.
It is always zero when using frame-threaded decoding, so that resetting
the codec is no longer necessary and has been removed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Deprecated in 40cf1bbacc.
(The currently disabled filter vf_mcdeint and vf_uspp were users of
this field; they have not been changed, so that whoever wants to fix
them can see the state of these filters when they were disabled.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Up until now, initializing the mutexes/condition variables wasn't
checked by ff_frame_thread_init(). This commit changes this.
Given that it is not documented to be save to destroy a zeroed but
otherwise uninitialized mutex/condition variable, one has to choose
between two approaches: Either one duplicates the code to free them
in ff_frame_thread_init() in case of errors or one records which have
been successfully initialized. This commit takes the latter approach:
For each of the two structures with mutexes/condition variables
an array containing the offsets of the members to initialize is added.
Said array is used both for initializing and freeing and the only thing
that needs to be recorded is how many of these have been successfully
initialized.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
In case an error happened when setting up the child threads,
ff_frame_thread_init() would up until now call ff_frame_thread_free()
to clean up all threads set up so far, including the current, not
properly initialized one.
But a half-allocated context needs special handling which
ff_frame_thread_frame_free() doesn't provide.
Notably, if allocating the AVCodecInternal, the codec's private data
or setting the options fails, the codec's close function will be
called (if there is one); it will also be called if the codec's init
function fails, regardless of whether the FF_CODEC_CAP_INIT_CLEANUP
is set. This is not supported by all codecs; in ticket #9099 it led
to a crash.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
They add considerable complexity to frame-threading implementation,
which includes an unavoidably leaking error path, while the advantages
of this option to the users are highly dubious.
It should be always possible and desirable for the callers to make their
get_buffer2() implementation thread-safe, so deprecate this option.
Currently the next thread's context is updated from the previous one's
if the codec descriptor is not marked as intra-only. That is not
entirely correct, since that property does not necessarily imply
anything about how a specific decoder implementation behaves.
Instead, use the presence of the update_thread_context() callback to
decide whether an update should be performed. Fixes races in CFHD,
should cause no behaviour change in any other decoders.
It is a constant known at codec init, so set it in
ff_frame_thread_init(). Also, only set it for video, since the meaning
of this field is not well-defined for audio with frame threading.
Fixes availability of delay in callbacks invoked from the per-thread
contexts after 1f4cf92cfb.
Currently the frame pool used by the default get_buffer2()
implementation is a single struct, allocated when opening the decoder.
A pointer to it is simply copied to each frame thread and we assume that
no thread attempts to modify it at an unexpected time. This is rather
fragile and potentially dangerous.
With this commit, the frame pool is made refcounted, with the reference
being propagated across threads along with other context variables. The
frame pool is now also immutable - when the stream parameters change we
drop the old reference and create a new one.
Specifically, between the user-facing one and the first frame thread
one.
This is fragile and dangerous, allocate separate private data for each
per-thread context.
The current design, where
- proper init is called for the first per-thread context
- first thread's private data is copied into private data for all the
other threads
- a "fixup" function is called for all the other threads to e.g.
allocate dynamically allocated data
is very fragile and hard to follow, so it is abandoned. Instead, the
same init function is used to init each per-thread context. Where
necessary, AVCodecInternal.is_copy can be used to differentiate between
the first thread and the other ones (e.g. for decoding the extradata
just once).
Resolution/format changes lead to re-initialization of hardware
accelerations(vaapi/dxva2/..) with new hwaccel_priv_data in
the worker-thread. But hwaccel_priv_data in user context won't
be updated until the resolution changing frame is output.
A termination with "-vframes" just after the reinit will lead to:
1. memory leak in worker-thread.
2. double free in user-thread.
Update user context in ff_frame_thread_free with the last thread
submit_packet() was called on.
To reproduce:
ffmpeg -hwaccel vaapi(dxva2) -v verbose -i
fate-suite/h264/reinit-large_420_8-to-small_420_8.h264 -pix_fmt nv12
-f rawvideo -vsync passthrough -vframes 47 -y out.yuv
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This removes the XP compatibility code, and switches entirely to SRW
locks, which are available starting at Windows Vista.
This removes CRITICAL_SECTION use, which allows us to add
PTHREAD_MUTEX_INITIALIZER, which will be useful later.
Windows XP is hereby not a supported build target anymore.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
And remove the function altogether while at it. It's a duplicate of
another.
Reviewed-by: wm4 <nfxjfg@googlemail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
This removes the XP compatibility code, and switches entirely to SWR
locks, which are available starting at Windows Vista.
This removes CRITICAL_SECTION use, which allows us to add
PTHREAD_MUTEX_INITIALIZER, which will be useful later.
Windows XP is hereby not a supported build target anymore. It was
decided in a project vote that this is OK.
The patch does not fix the tsan warning it was intended to fix.
Reverting the patch moves the av_log() back to the outside of the lock.
Signed-off-by: Wan-Teh Chang <wtc@google.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Add the debug_threads boolean field to PerThreadContext. For
PerThreadContext *p, p->debug_threads records whether the
FF_DEBUG_THREADS bit is set in p->avctx->debug, and p->debug_threads and
p->avctx->debug are kept in sync. The debug_threads field is defined as
an atomic_int to allow atomic read by another thread in
ff_thread_await_progress().
This fixes the tsan warning that
2e664b9c1e attempted to fix:
WARNING: ThreadSanitizer: data race (pid=452658)
Write of size 4 at 0x7b640003f4fc by main thread (mutexes: write M248499):
#0 update_context_from_user [..]/libavcodec/pthread_frame.c:335:19 (5ab42bb1a6f4b068d7863dabe9b2bacc+0xe73859)
[..]
Previous read of size 4 at 0x7b640003f4fc by thread T130 (mutexes: write M248502, write M248500):
#0 ff_thread_await_progress [..]/libavcodec/pthread_frame.c:591:26 (5ab42bb1a6f4b068d7863dabe9b2bacc+0xe749a1)
Signed-off-by: Wan-Teh Chang <wtc@google.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
So, all frames and errors are correctly reported in order.
Also limit the numbers of error during draining to prevent infinite loop.
This fix fate failure with THREADS>=4:
make fate-h264-attachment-631 THREADS=4
This also reverts a755b725ec.
Suggested-by: wm4, Ronald S. Bultje, Marton Balint
Reviewed-by: w4 <nfxjfg@googlemail.com>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
The av_log() is done outside the lock, but this way the accesses to the
field (reads and writes) are always protected by a mutex. The av_log()
is not run inside the lock context because it may involve user callbacks
and doing that in performance-sensitive code is probably not a good idea.
This should fix occasional tsan warnings when running fate-h264, like:
WARNING: ThreadSanitizer: data race (pid=10916)
Write of size 4 at 0x7d64000174fc by main thread (mutexes: write M2313):
#0 update_context_from_user src/libavcodec/pthread_frame.c:335 (ffmpeg+0x000000df7b06)
[..]
Previous read of size 4 at 0x7d64000174fc by thread T1 (mutexes: write M2311):
#0 ff_thread_await_progress src/libavcodec/pthread_frame.c:592 (ffmpeg+0x000000df8b3e)
Consider the following sequence of events:
- open a codec without AV_CODEC_CAP_DELAY
- decode call fails with an error
- ff_thread_flush() is called
- drain packet is sent
Then the last step would make ff_thread_decode_frame() return an error,
because p->result can still be set to an error value. This is because
submit_packet returns immediately if AV_CODEC_CAP_DELAY is not set, and
no worker thread gets the chance to reset p->result, yet its value is
trusted by ff_thread_decode_frame().
Fix this by clearing the error fields on flush.
This tries to handle cases where separate invocations of decode_frame()
(each running in separate threads) write to respective fields in the
same AVFrame->data[]. Having per-field owners makes interaction between
readers (the referencing thread) and writers (the decoding thread)
slightly more optimal if both accesses are field-based, since they will
use the respective producer's thread objects (mutex/cond) instead of
sharing the thread objects of the first field's producer.
In practice, this fixes the following tsan-warning in fate-h264:
WARNING: ThreadSanitizer: data race (pid=21615)
Read of size 4 at 0x7d640000d9fc by thread T2 (mutexes: write M1006):
#0 ff_thread_report_progress pthread_frame.c:569 (ffmpeg:x86_64+0x100f7cf54)
[..]
Previous write of size 4 at 0x7d640000d9fc by main thread (mutexes: write M1004):
#0 update_context_from_user pthread_frame.c:335 (ffmpeg:x86_64+0x100f81abb)
Otherwise the thread may still be in the middle of decoding a previous
frame, which would effectively trigger a race condition on any field
concurrently read and written.
In practice, this fixes tsan warnings like the following:
WARNING: ThreadSanitizer: data race (pid=17380)
Write of size 4 at 0x7d64000160fc by main thread:
#0 update_context_from_user src/libavcodec/pthread_frame.c:335 (ffmpeg+0x000000dca515)
[..]
Previous read of size 4 at 0x7d64000160fc by thread T2 (mutexes: write M1821):
#0 ff_thread_report_progress src/libavcodec/pthread_frame.c:565 (ffmpeg+0x000000dcb08a)
Get rid of the "ret" variable, and always use err. Report the packet as
consumed if err is unset. This should be equivalent to the old code,
which obviously required err=0 for p->result>=0 (and otherwise,
p->result must have had the value err was last set to). The code block
added by commit 32a5b63126 is also not needed anymore, because the new
code strictly returns err if it's >=0.
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Intra-only codecs should either be able to read these items from the
bitstream, or they should be set upon codec initialization. In both
cases, syncing these items at runtime is unnecessary.
In practice, this fixes race conditions for decoders that read these
values from the bitstream.
Could lead to random behavior. This possibly happened due to commit
32a5b63126. This should/could probably be simplified, but for no apply
a minimal fix to quell the errors.
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
async_mutex has is used in a very strange but intentional way: it is
locked by default, and unlocked only in regions that can be run
concurrently.
If the user was calling API functions to the same context from different
threads (in a safe way), this could unintentionally unlock the mutex on
a different thread than the previous lock operation. It's not allowed by
the pthread API.
Fix this by emulating a binary semaphore using a mutex and condition
variable. (Posix semaphores are not available on all platforms.)
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
This "reuses" the flags introduced for the av_vdpau_bind_context() API
function, and makes them available to all hwaccels. This does not affect
the current vdpau API, as av_vdpau_bind_context() should obviously
override the AVCodecContext.hwaccel_flags flags for the sake of
compatibility.
Cherry-picked from Libav commit 16a163b5.
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Certain hardware decoding APIs are not guaranteed to be thread-safe, so
having the user access decoded hardware surfaces while the decoder is
running in another thread can cause failures (this is mainly known to
happen with DXVA2).
For such hwaccels, only allow the decoding thread to run while the user
is inside a lavc decode call (avcodec_send_packet/receive_frame).
Merges Libav commit d4a91e65.
Signed-off-by: wm4 <nfxjfg@googlemail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
This improves commit 59c7022740.
In ff_thread_report_progress(), the fast code path can load
progress[field] with the relaxed memory order, and the slow code path
can store progress[field] with the release memory order. These changes
are mainly intended to avoid confusion when one inspects the source code.
They are unlikely to have measurable performance improvement.
ff_thread_report_progress() and ff_thread_await_progress() form a pair.
ff_thread_await_progress() reads progress[field] with the acquire memory
order (in the fast code path). Therefore, one expects to see
ff_thread_report_progress() write progress[field] with the matching
release memory order.
In the fast code path in ff_thread_report_progress(), the atomic load of
progress[field] doesn't need the acquire memory order because the
calling thread is trying to make the data it just decoded visible to the
other threads, rather than trying to read the data decoded by other
threads.
In ff_thread_get_buffer(), initialize progress[0] and progress[1] using
atomic_init().
Signed-off-by: Wan-Teh Chang <wtc@google.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Merges Libav commit 343e2833.
Signed-off-by: wm4 <nfxjfg@googlemail.com>
When decoding with threads enabled, the get_format callback will be
called with one of the per-thread codec contexts rather than with the
outer context. If a hwaccel is in use too, this will add a reference
to the hardware frames context on that codec context, which will then
propagate to all of the other per-thread contexts for decoding. Once
the decoder finishes, however, the per-thread contexts are not freed
normally, so these references leak.
Merges Libav commit fd0fae60.
Signed-off-by: wm4 <nfxjfg@googlemail.com>