Before patch, memory was allocated in each thread functions,
which may cause more than one time of memory allocation and
cause crash.
After patch, memory is allocated in the main thread once,
an index was parsed into thread functions. Bug fixed.
Signed-off-by: Xu Jun <xujunzz@sjtu.edu.cn>
show all input/output names when the input or output name not correct
Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Found via ASAN with the dnn-layer-conv2d FATE-test.
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Use pthread to multithread dnn_execute_layer_conv2d.
Can be tested with command "./ffmpeg_g -i input.png -vf \
format=yuvj420p,dnn_processing=dnn_backend=native:model= \
espcn.model:input=x:output=y:options=conv2d_threads=23 \
-y sr_native.jpg -benchmark"
before patch: utime=11.238s stime=0.005s rtime=11.248s
after patch: utime=20.817s stime=0.047s rtime=1.051s
on my 3900X 12c24t @4.2GHz
About the increase of utime, it's because that CPU HyperThreading
technology makes logical cores twice of physical cores while cpu's
counting performance improves less than double. And utime sums
all cpu's logical cores' runtime. As a result, using threads num
near cpu's logical core's number will double utime, while reduce
rtime less than half for HyperThreading CPUs.
Signed-off-by: Xu Jun <xujunzz@sjtu.edu.cn>
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Unify all error return as DNN_ERROR, in order to cease model executing
when return error in ff_dnn_execute_model_native layer_func.pf_exec
Signed-off-by: Ting Fu <ting.fu@intel.com>
currently, output is set both at DNNModel.set_input_output and
DNNModule.execute_model, it makes sense that the output name is
provided at model inference time so all the output info is set
at a single place.
and so DNNModel.set_input_output is renamed to DNNModel.set_input
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
dnn_execute_layer_avg_pool() contains the following line:
assert(avgpool_params->padding_method = VALID);
This statement contains an assignment where obviously a comparison was
intended. Furthermore, *avgpool_params is const, so that the attempted
assignment leads to a compilation failure if asserts are enabled
(i.e. if DEBUG is defined which leads libavutil/internal.h to not define
NDEBUG). Moreover, the enumeration constant VALID actually has the value 0,
so that the assert would be triggered if a compiler compiles this with
asserts enabled. Finally, the statement uses assert() directly instead
of av_assert*().
All these errors have been fixed.
Thanks to ubitux for providing a FATE-box [1] where DEBUG is defined.
[1]: http://fate.ffmpeg.org/history.cgi?slot=x86_64-archlinux-gcc-ddebug
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
different backend might need different options for a better performance,
so, add the parameter into dnn interface, as a preparation.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
We should not silently allocate an incorrect sized buffer.
Fixes trac issue #8718.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
OpenVINO is a Deep Learning Deployment Toolkit at
https://github.com/openvinotoolkit/openvino, it supports CPU, GPU
and heterogeneous plugins to accelerate deep learning inferencing.
Please refer to https://github.com/openvinotoolkit/openvino/blob/master/build-instruction.md
to build openvino (c library is built at the same time). Please add
option -DENABLE_MKL_DNN=ON for cmake to enable CPU path. The header
files and libraries are installed to /usr/local/deployment_tools/inference_engine/
with default options on my system.
To build FFmpeg with openvion, take my system as an example, run with:
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/deployment_tools/inference_engine/lib/intel64/:/usr/local/deployment_tools/inference_engine/external/tbb/lib/
$ ../ffmpeg/configure --enable-libopenvino --extra-cflags=-I/usr/local/deployment_tools/inference_engine/include/ --extra-ldflags=-L/usr/local/deployment_tools/inference_engine/lib/intel64
$ make
Here are the features provided by OpenVINO inference engine:
- support more DNN model formats
It supports TensorFlow, Caffe, ONNX, MXNet and Kaldi by converting them
into OpenVINO format with a python script. And torth model
can be first converted into ONNX and then to OpenVINO format.
see the script at https://github.com/openvinotoolkit/openvino/tree/master/model-optimizer/mo.py
which also does some optimization at model level.
- optimize at inference stage
It optimizes for X86 CPUs with SSE, AVX etc.
It also optimizes based on OpenCL for Intel GPUs.
(only Intel GPU supported becuase Intel OpenCL extension is used for optimization)
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
more math unary operations will be added here
It can be tested with the model file generated with below python scripy:
import tensorflow as tf
import numpy as np
import imageio
in_img = imageio.imread('input.jpeg')
in_img = in_img.astype(np.float32)/255.0
in_data = in_img[np.newaxis, :]
x = tf.placeholder(tf.float32, shape=[1, None, None, 3], name='dnn_in')
x1 = tf.subtract(x, 0.5)
x2 = tf.abs(x1)
y = tf.identity(x2, name='dnn_out')
sess=tf.Session()
sess.run(tf.global_variables_initializer())
graph_def = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, ['dnn_out'])
tf.train.write_graph(graph_def, '.', 'image_process.pb', as_text=False)
print("image_process.pb generated, please use \
path_to_ffmpeg/tools/python/convert.py to generate image_process.model\n")
output = sess.run(y, feed_dict={x: in_data})
imageio.imsave("out.jpg", np.squeeze(output))
Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
to support dnn networks more general, we need to know the input info
of the dnn model.
background:
The data type of dnn model's input could be float32, uint8 or fp16, etc.
And the w/h of input image could be fixed or variable.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
so, we can make a filter more general to accept different network
models, by adding a data type convertion after getting data from network.
After we add dt field into struct DNNData, it becomes the same as
DNNInputData, so merge them with one struct: DNNData.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>
Unlike other tf.*.conv2d layers, tf.nn.conv2d does not create many
nodes (within a scope) in the graph, it just acts like other layers.
tf.nn.conv2d only creates one node in the graph, and no internal
nodes such as 'kernel' are created.
The format of native model file is also changed, a flag named
has_bias is added, so change the version number.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Pedro Arthur <bygrandao@gmail.com>