Make the one-time initialization in av_get_cpu_flags() thread-safe. The
static variables |flags|, |cpuflags_mask|, and |checked| in
libavutil/cpu.c are read and written using normal load and store
operations. These are considered as data races. The fix is to use atomic
load and store operations.
Remove the |checked| variable because the invalid value of -1 for
|flags| can be used to indicate the same condition. Rename |flags| to
|cpu_flags| and move it to file scope.
The fix can be verified by running the libavutil/tests/cpu_init.c test
program under ThreadSanitizer:
./configure --toolchain=clang-tsan
make libavutil/tests/cpu_init
libavutil/tests/cpu_init
There should be no warnings from ThreadSanitizer.
Co-author: Dmitry Vyukov of Google, who suggested the data race fix.
Signed-off-by: Wan-Teh Chang <wtc@google.com>
Supporting the system was a nice joke for the 9 release, but it has
run its course. Nowadays Plan 9 receives no testing and has no
practical usefulness.
This would be simpler if codecpar supported AVOptions
modern ffserver should be unaffected by this, older ffserver which required the
muxer to directly access the encoder could have issues with this, but this
direct access is just wrong and unsafe
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This accesses the private encoder context, it should not be used by
the current ffserver it may affect old ffserver versions but i believe
there is consens that accessing the private encoder context from the muxer
is completely wrong.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
They have changed due to 122190392b
Reviewed-by: "Reynaldo H. Verdejo Pinochet" <reynaldo@osg.samsung.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This work is sponsored by, and copyright, Google.
Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:
Cortex A7 A8 A9 A53
vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon: 18531.7 16582.3 14207.6 12000.3
By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:
vp9_inv_dct_dct_16x16_sub1_add_neon: 274.6 189.5 211.7 235.8
vp9_inv_dct_dct_16x16_sub2_add_neon: 2064.0 1534.8 1719.4 1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon: 2135.0 1477.2 1736.3 1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon: 2446.7 1828.7 1993.6 1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon: 2832.4 2118.3 2266.5 1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon: 3211.7 2475.3 2523.5 1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon: 756.2 456.7 862.0 553.9
vp9_inv_dct_dct_32x32_sub2_add_neon: 10682.2 8190.4 8539.2 6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon: 10813.5 8014.9 8518.3 6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon: 11859.6 9313.0 9347.4 7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon: 12946.6 10752.4 10192.2 8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon: 14074.6 11946.5 11001.4 9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon: 15269.9 13662.7 11816.1 9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon: 16327.9 14940.1 12626.7 10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon: 17462.7 15776.1 13446.2 11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon: 18575.5 17157.0 14249.3 12015.1
I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.
In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.
Signed-off-by: Martin Storsjö <martin@martin.st>
This fixes producing swf and rm files as done by ffservertest.
Reviewed-by: Reynaldo H. Verdejo Pinochet <reynaldo@osg.samsung.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
It randomly causes failures with an error like:
"Failed to set value '-f' for option 'd': Error number -920332800 occurred"
(The error number is different every time.)
Reviewed-by: Reynaldo H. Verdejo Pinochet <reynaldo@osg.samsung.com>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Not doing so makes debugging unnecessarily hard.
Reviewed-by: Reynaldo H. Verdejo Pinochet <reynaldo@osg.samsung.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
This reverts commit 81d7f0bbca.
Instead of just benchmarking dc separately, test all relevant subparts
(in the next commit).
Signed-off-by: Martin Storsjö <martin@martin.st>
If makeopts_fate is set, these makeopts are used for running the
tests instead of the normal makeopts. If it isn't set, the normal
makeopts variable is used as before.
This is useful if remote testing on a lesser machine where a large
number of parallel jobs might be undesireable, while wanting to speed
up the build with many parallel processes.
Signed-off-by: Martin Storsjö <martin@martin.st>
This reverts commit e0c6b32046.
Said commit changed the behavior of the demuxer and decoder in a non
backwards compatible way.
Demuxers should make extradata available at init if possible, and send
new extradata as side data within a packet if needed.
A better fix for the remuxing crash will follow.
Signed-off-by: James Almer <jamrial@gmail.com>
The dc-only mode is already checked to work correctly above, but this
allows benchmarking this mode for performance tuning, and allows making
sure that it actually is correctly hooked up.
Signed-off-by: Martin Storsjö <martin@martin.st>
The test is not supposed to cover audio.
Also, using -vframes along with an audio stream depends on
the exact order the frames are processed by filters, it is
too much constraint to guarantee.