Use pthread to multithread dnn_execute_layer_conv2d.
Can be tested with command "./ffmpeg_g -i input.png -vf \
format=yuvj420p,dnn_processing=dnn_backend=native:model= \
espcn.model:input=x:output=y:options=conv2d_threads=23 \
-y sr_native.jpg -benchmark"
before patch: utime=11.238s stime=0.005s rtime=11.248s
after patch: utime=20.817s stime=0.047s rtime=1.051s
on my 3900X 12c24t @4.2GHz
About the increase of utime, it's because that CPU HyperThreading
technology makes logical cores twice of physical cores while cpu's
counting performance improves less than double. And utime sums
all cpu's logical cores' runtime. As a result, using threads num
near cpu's logical core's number will double utime, while reduce
rtime less than half for HyperThreading CPUs.
Signed-off-by: Xu Jun <xujunzz@sjtu.edu.cn>
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
When one of output[i] & expected_output is NAN, the unit test will always pass.
Signed-off-by: Ting Fu <ting.fu@intel.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
This fixes tests on 32 bit x86 mingw with clang, which uses x87
fpu by default.
In this setup, while the get_expected function is declared to
return float, the compiler is (especially given the optimization
flags set) free to keep the intermediate values (in this case,
the return value from the inlined function) in higher precision.
This results in the situation where 7.28 (which actually, as
a float, ends up as 7.2800002098), multiplied by 100, is
728.000000 when really forced into a 32 bit float, but 728.000021
when kept with higher intermediate precision.
For the multiplication case, a more suitable epsilon would e.g.
be 2*FLT_EPSILON*fabs(expected_output), but just increase the
current hardcoded threshold for now.
Signed-off-by: Martin Storsjö <martin@martin.st>
Unlike other tf.*.conv2d layers, tf.nn.conv2d does not create many
nodes (within a scope) in the graph, it just acts like other layers.
tf.nn.conv2d only creates one node in the graph, and no internal
nodes such as 'kernel' are created.
The format of native model file is also changed, a flag named
has_bias is added, so change the version number.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
the info can be saved in dnn operand object without regenerating again and again,
and it is also needed for layer split/merge, and for memory reuse.
to make things step by step, this patch just focuses on c code,
the change within python script will be added later.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
background:
DNN (deep neural network) is a sub module of libavfilter, and FATE/dnn
is unit test for the DNN module, one unit test for one dnn layer.
The unit tests are not based on the APIs exported by libavfilter,
they just directly call into the functions within DNN submodule.
There is an issue when run the following command:
build$ ../ffmpeg/configure --disable-static --enable-shared
make
make fate-dnn-layer-pad
And part of error message:
tests/dnn/dnn-layer-pad-test.o: In function `test_with_mode_symmetric':
/work/media/ffmpeg/build/src/tests/dnn/dnn-layer-pad-test.c:73: undefined reference to `dnn_execute_layer_pad'
The root cause is that function dnn_execute_layer_pad is a LOCAL symbol
in libavfilter.so, and so the linker could not find it when build dnn-layer-pad-test.
To check it, just run: readelf -s libavfilter/libavfilter.so | grep dnn
So, add dependency in fate/dnn Makefile with ffmpeg static libraries.
This is the same method used in fate/checkasm
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>