So, all frames and errors are correctly reported in order.
Also limit the numbers of error during draining to prevent infinite loop.
This fix fate failure with THREADS>=4:
make fate-h264-attachment-631 THREADS=4
This also reverts a755b725ec.
Suggested-by: wm4, Ronald S. Bultje, Marton Balint
Reviewed-by: w4 <nfxjfg@googlemail.com>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
The av_log() is done outside the lock, but this way the accesses to the
field (reads and writes) are always protected by a mutex. The av_log()
is not run inside the lock context because it may involve user callbacks
and doing that in performance-sensitive code is probably not a good idea.
This should fix occasional tsan warnings when running fate-h264, like:
WARNING: ThreadSanitizer: data race (pid=10916)
Write of size 4 at 0x7d64000174fc by main thread (mutexes: write M2313):
#0 update_context_from_user src/libavcodec/pthread_frame.c:335 (ffmpeg+0x000000df7b06)
[..]
Previous read of size 4 at 0x7d64000174fc by thread T1 (mutexes: write M2311):
#0 ff_thread_await_progress src/libavcodec/pthread_frame.c:592 (ffmpeg+0x000000df8b3e)
Consider the following sequence of events:
- open a codec without AV_CODEC_CAP_DELAY
- decode call fails with an error
- ff_thread_flush() is called
- drain packet is sent
Then the last step would make ff_thread_decode_frame() return an error,
because p->result can still be set to an error value. This is because
submit_packet returns immediately if AV_CODEC_CAP_DELAY is not set, and
no worker thread gets the chance to reset p->result, yet its value is
trusted by ff_thread_decode_frame().
Fix this by clearing the error fields on flush.
This tries to handle cases where separate invocations of decode_frame()
(each running in separate threads) write to respective fields in the
same AVFrame->data[]. Having per-field owners makes interaction between
readers (the referencing thread) and writers (the decoding thread)
slightly more optimal if both accesses are field-based, since they will
use the respective producer's thread objects (mutex/cond) instead of
sharing the thread objects of the first field's producer.
In practice, this fixes the following tsan-warning in fate-h264:
WARNING: ThreadSanitizer: data race (pid=21615)
Read of size 4 at 0x7d640000d9fc by thread T2 (mutexes: write M1006):
#0 ff_thread_report_progress pthread_frame.c:569 (ffmpeg:x86_64+0x100f7cf54)
[..]
Previous write of size 4 at 0x7d640000d9fc by main thread (mutexes: write M1004):
#0 update_context_from_user pthread_frame.c:335 (ffmpeg:x86_64+0x100f81abb)
Otherwise the thread may still be in the middle of decoding a previous
frame, which would effectively trigger a race condition on any field
concurrently read and written.
In practice, this fixes tsan warnings like the following:
WARNING: ThreadSanitizer: data race (pid=17380)
Write of size 4 at 0x7d64000160fc by main thread:
#0 update_context_from_user src/libavcodec/pthread_frame.c:335 (ffmpeg+0x000000dca515)
[..]
Previous read of size 4 at 0x7d64000160fc by thread T2 (mutexes: write M1821):
#0 ff_thread_report_progress src/libavcodec/pthread_frame.c:565 (ffmpeg+0x000000dcb08a)
Get rid of the "ret" variable, and always use err. Report the packet as
consumed if err is unset. This should be equivalent to the old code,
which obviously required err=0 for p->result>=0 (and otherwise,
p->result must have had the value err was last set to). The code block
added by commit 32a5b63126 is also not needed anymore, because the new
code strictly returns err if it's >=0.
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Intra-only codecs should either be able to read these items from the
bitstream, or they should be set upon codec initialization. In both
cases, syncing these items at runtime is unnecessary.
In practice, this fixes race conditions for decoders that read these
values from the bitstream.
Could lead to random behavior. This possibly happened due to commit
32a5b63126. This should/could probably be simplified, but for no apply
a minimal fix to quell the errors.
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
async_mutex has is used in a very strange but intentional way: it is
locked by default, and unlocked only in regions that can be run
concurrently.
If the user was calling API functions to the same context from different
threads (in a safe way), this could unintentionally unlock the mutex on
a different thread than the previous lock operation. It's not allowed by
the pthread API.
Fix this by emulating a binary semaphore using a mutex and condition
variable. (Posix semaphores are not available on all platforms.)
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
This "reuses" the flags introduced for the av_vdpau_bind_context() API
function, and makes them available to all hwaccels. This does not affect
the current vdpau API, as av_vdpau_bind_context() should obviously
override the AVCodecContext.hwaccel_flags flags for the sake of
compatibility.
Cherry-picked from Libav commit 16a163b5.
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Certain hardware decoding APIs are not guaranteed to be thread-safe, so
having the user access decoded hardware surfaces while the decoder is
running in another thread can cause failures (this is mainly known to
happen with DXVA2).
For such hwaccels, only allow the decoding thread to run while the user
is inside a lavc decode call (avcodec_send_packet/receive_frame).
Merges Libav commit d4a91e65.
Signed-off-by: wm4 <nfxjfg@googlemail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
This improves commit 59c7022740.
In ff_thread_report_progress(), the fast code path can load
progress[field] with the relaxed memory order, and the slow code path
can store progress[field] with the release memory order. These changes
are mainly intended to avoid confusion when one inspects the source code.
They are unlikely to have measurable performance improvement.
ff_thread_report_progress() and ff_thread_await_progress() form a pair.
ff_thread_await_progress() reads progress[field] with the acquire memory
order (in the fast code path). Therefore, one expects to see
ff_thread_report_progress() write progress[field] with the matching
release memory order.
In the fast code path in ff_thread_report_progress(), the atomic load of
progress[field] doesn't need the acquire memory order because the
calling thread is trying to make the data it just decoded visible to the
other threads, rather than trying to read the data decoded by other
threads.
In ff_thread_get_buffer(), initialize progress[0] and progress[1] using
atomic_init().
Signed-off-by: Wan-Teh Chang <wtc@google.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Merges Libav commit 343e2833.
Signed-off-by: wm4 <nfxjfg@googlemail.com>
When decoding with threads enabled, the get_format callback will be
called with one of the per-thread codec contexts rather than with the
outer context. If a hwaccel is in use too, this will add a reference
to the hardware frames context on that codec context, which will then
propagate to all of the other per-thread contexts for decoding. Once
the decoder finishes, however, the per-thread contexts are not freed
normally, so these references leak.
Merges Libav commit fd0fae60.
Signed-off-by: wm4 <nfxjfg@googlemail.com>
Certain hardware decoding APIs are not guaranteed to be thread-safe, so
having the user access decoded hardware surfaces while the decoder is
running in another thread can cause failures (this is mainly known to
happen with DXVA2).
For such hwaccels, only allow the decoding thread to run while the user
is inside a lavc decode call (avcodec_send_packet/receive_frame).
This improves commit 59c7022740.
In ff_thread_report_progress(), the fast code path can load
progress[field] with the relaxed memory order, and the slow code path
can store progress[field] with the release memory order. These changes
are mainly intended to avoid confusion when one inspects the source code.
They are unlikely to have measurable performance improvement.
ff_thread_report_progress() and ff_thread_await_progress() form a pair.
ff_thread_await_progress() reads progress[field] with the acquire memory
order (in the fast code path). Therefore, one expects to see
ff_thread_report_progress() write progress[field] with the matching
release memory order.
In the fast code path in ff_thread_report_progress(), the atomic load of
progress[field] doesn't need the acquire memory order because the
calling thread is trying to make the data it just decoded visible to the
other threads, rather than trying to read the data decoded by other
threads.
In ff_thread_get_buffer(), initialize progress[0] and progress[1] using
atomic_init().
Signed-off-by: Wan-Teh Chang <wtc@google.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The current code stores a pointer to the packet passed to the decoder,
which is then used during get_buffer() for timestamps and side data
passthrough. However, since this is a pointer to user data which we do
not own, storing it is potentially dangerous. It is also ill defined for
the new decoding API with split input/output.
Fix this problem by making an explicit internally owned copy of the
packet properties.
When decoding with threads enabled, the get_format callback will be
called with one of the per-thread codec contexts rather than with the
outer context. If a hwaccel is in use too, this will add a reference
to the hardware frames context on that codec context, which will then
propagate to all of the other per-thread contexts for decoding. Once
the decoder finishes, however, the per-thread contexts are not freed
normally, so these references leak.
This fixes a data race warning by ThreadSanitizer.
FrameThreadContext.die is read by all the worker threads but is not
protected by any mutex. Move it to PerThreadContext so that each worker
thread reads its own copy of |die|, which can then be protected with
PerThreadContext.mutex.
Signed-off-by: Wan-Teh Chang <wtc@google.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The rationale is that coded_frame was only used to communicate key_frame,
pict_type and quality to the caller, as well as a few other random fields,
in a non predictable, let alone consistent way.
There was agreement that there was no use case for coded_frame, as it is
a full-sized AVFrame container used for just 2-3 int-sized properties,
which shouldn't even belong into the AVCodecContext in the first place.
The appropriate AVPacket flag can be used instead of key_frame, while
quality is exported with the new AVPacketSideData quality factor.
There is no replacement for the other fields as they were unreliable,
mishandled or just not used at all.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Fixes: b4b47bc2b3fb7ca710bfffe5aa969e37_signal_sigabrt_7ffff70eccc9_744_nc_sample2.avi with memlimit of 4194304
Found-by: Samuel Groß, Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes null pointer dereferences
Fixes: af1a5a33e67e479f439239097bd0d4fd_signal_sigsegv_7ffff713351a_152_Dolby_Rain_Logo.pmp with memlimit of 8388608
Found-by: Samuel Groß, Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This is the first part of the fix for ticket #4370.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
When decoding, this field holds the inverse of the framerate that can be
written in the headers for some codecs. Using a field called 'time_base'
for this is very misleading, as there are no timestamps associated with
it. Furthermore, this field is used for a very different purpose during
encoding.
Add a new field, called 'framerate', to replace the use of time_base for
decoding.