mirror of https://github.com/opencv/opencv.git
Custom layers for deep learning networks (#11129)
* Custom deep learning layers support * Stack custom deep learning layerspull/11385/head
parent
909a25571e
commit
4ec456f0a0
19 changed files with 928 additions and 146 deletions
@ -0,0 +1,192 @@ |
|||||||
|
# Custom deep learning layers support {#tutorial_dnn_custom_layers} |
||||||
|
|
||||||
|
## Introduction |
||||||
|
Deep learning is a fast growing area. The new approaches to build neural networks |
||||||
|
usually introduce new types of layers. They could be modifications of existing |
||||||
|
ones or implement outstanding researching ideas. |
||||||
|
|
||||||
|
OpenCV gives an opportunity to import and run networks from different deep learning |
||||||
|
frameworks. There are a number of the most popular layers. However you can face |
||||||
|
a problem that your network cannot be imported using OpenCV because of unimplemented layers. |
||||||
|
|
||||||
|
The first solution is to create a feature request at https://github.com/opencv/opencv/issues |
||||||
|
mentioning details such a source of model and type of new layer. A new layer could |
||||||
|
be implemented if OpenCV community shares this need. |
||||||
|
|
||||||
|
The second way is to define a **custom layer** so OpenCV's deep learning engine |
||||||
|
will know how to use it. This tutorial is dedicated to show you a process of deep |
||||||
|
learning models import customization. |
||||||
|
|
||||||
|
## Define a custom layer in C++ |
||||||
|
Deep learning layer is a building block of network's pipeline. |
||||||
|
It has connections to **input blobs** and produces results to **output blobs**. |
||||||
|
There are trained **weights** and **hyper-parameters**. |
||||||
|
Layers' names, types, weights and hyper-parameters are stored in files are generated by |
||||||
|
native frameworks during training. If OpenCV mets unknown layer type it throws an |
||||||
|
exception trying to read a model: |
||||||
|
|
||||||
|
``` |
||||||
|
Unspecified error: Can't create layer "layer_name" of type "MyType" in function getLayerInstance |
||||||
|
``` |
||||||
|
|
||||||
|
To import the model correctly you have to derive a class from cv::dnn::Layer with |
||||||
|
the following methods: |
||||||
|
|
||||||
|
@snippet dnn/custom_layers.cpp A custom layer interface |
||||||
|
|
||||||
|
And register it before the import: |
||||||
|
|
||||||
|
@snippet dnn/custom_layers.cpp Register a custom layer |
||||||
|
|
||||||
|
@note `MyType` is a type of unimplemented layer from the thrown exception. |
||||||
|
|
||||||
|
Let's see what all the methods do: |
||||||
|
|
||||||
|
- Constructor |
||||||
|
|
||||||
|
@snippet dnn/custom_layers.cpp MyLayer::MyLayer |
||||||
|
|
||||||
|
Retrieves hyper-parameters from cv::dnn::LayerParams. If your layer has trainable |
||||||
|
weights they will be already stored in the Layer's member cv::dnn::Layer::blobs. |
||||||
|
|
||||||
|
- A static method `create` |
||||||
|
|
||||||
|
@snippet dnn/custom_layers.cpp MyLayer::create |
||||||
|
|
||||||
|
This method should create an instance of you layer and return cv::Ptr with it. |
||||||
|
|
||||||
|
- Output blobs' shape computation |
||||||
|
|
||||||
|
@snippet dnn/custom_layers.cpp MyLayer::getMemoryShapes |
||||||
|
|
||||||
|
Returns layer's output shapes depends on input shapes. You may request an extra |
||||||
|
memory using `internals`. |
||||||
|
|
||||||
|
- Run a layer |
||||||
|
|
||||||
|
@snippet dnn/custom_layers.cpp MyLayer::forward |
||||||
|
|
||||||
|
Implement a layer's logic here. Compute outputs for given inputs. |
||||||
|
|
||||||
|
@note OpenCV manages memory allocated for layers. In the most cases the same memory |
||||||
|
can be reused between layers. So your `forward` implementation should not rely that |
||||||
|
the second invocation of `forward` will has the same data at `outputs` and `internals`. |
||||||
|
|
||||||
|
- Optional `finalize` method |
||||||
|
|
||||||
|
@snippet dnn/custom_layers.cpp MyLayer::finalize |
||||||
|
|
||||||
|
The chain of methods are the following: OpenCV deep learning engine calls `create` |
||||||
|
method once then it calls `getMemoryShapes` for an every created layer then you |
||||||
|
can make some preparations depends on known input dimensions at cv::dnn::Layer::finalize. |
||||||
|
After network was initialized only `forward` method is called for an every network's input. |
||||||
|
|
||||||
|
@note Varying input blobs' sizes such height or width or batch size you make OpenCV |
||||||
|
reallocate all the internal memory. That leads efficiency gaps. Try to initialize |
||||||
|
and deploy models using a fixed batch size and image's dimensions. |
||||||
|
|
||||||
|
## Example: custom layer from Caffe |
||||||
|
Let's create a custom layer `Interp` from https://github.com/cdmh/deeplab-public. |
||||||
|
It's just a simple resize that takes an input blob of size `N x C x Hi x Wi` and returns |
||||||
|
an output blob of size `N x C x Ho x Wo` where `N` is a batch size, `C` is a number of channels, |
||||||
|
`Hi x Wi` and `Ho x Wo` are input and output `height x width` correspondingly. |
||||||
|
This layer has no trainable weights but it has hyper-parameters to specify an output size. |
||||||
|
|
||||||
|
In example, |
||||||
|
~~~~~~~~~~~~~ |
||||||
|
layer { |
||||||
|
name: "output" |
||||||
|
type: "Interp" |
||||||
|
bottom: "input" |
||||||
|
top: "output" |
||||||
|
interp_param { |
||||||
|
height: 9 |
||||||
|
width: 8 |
||||||
|
} |
||||||
|
} |
||||||
|
~~~~~~~~~~~~~ |
||||||
|
|
||||||
|
This way our implementation can look like: |
||||||
|
|
||||||
|
@snippet dnn/custom_layers.cpp InterpLayer |
||||||
|
|
||||||
|
Next we need to register a new layer type and try to import the model. |
||||||
|
|
||||||
|
@snippet dnn/custom_layers.cpp Register InterpLayer |
||||||
|
|
||||||
|
## Example: custom layer from TensorFlow |
||||||
|
This is an example of how to import a network with [tf.image.resize_bilinear](https://www.tensorflow.org/versions/master/api_docs/python/tf/image/resize_bilinear) |
||||||
|
operation. This is also a resize but with an implementation different from OpenCV's or `Interp` above. |
||||||
|
|
||||||
|
Let's create a single layer network: |
||||||
|
~~~~~~~~~~~~~{.py} |
||||||
|
inp = tf.placeholder(tf.float32, [2, 3, 4, 5], 'input') |
||||||
|
resized = tf.image.resize_bilinear(inp, size=[9, 8], name='resize_bilinear') |
||||||
|
~~~~~~~~~~~~~ |
||||||
|
OpenCV sees that TensorFlow's graph in the following way: |
||||||
|
|
||||||
|
``` |
||||||
|
node { |
||||||
|
name: "input" |
||||||
|
op: "Placeholder" |
||||||
|
attr { |
||||||
|
key: "dtype" |
||||||
|
value { |
||||||
|
type: DT_FLOAT |
||||||
|
} |
||||||
|
} |
||||||
|
} |
||||||
|
node { |
||||||
|
name: "resize_bilinear/size" |
||||||
|
op: "Const" |
||||||
|
attr { |
||||||
|
key: "dtype" |
||||||
|
value { |
||||||
|
type: DT_INT32 |
||||||
|
} |
||||||
|
} |
||||||
|
attr { |
||||||
|
key: "value" |
||||||
|
value { |
||||||
|
tensor { |
||||||
|
dtype: DT_INT32 |
||||||
|
tensor_shape { |
||||||
|
dim { |
||||||
|
size: 2 |
||||||
|
} |
||||||
|
} |
||||||
|
tensor_content: "\t\000\000\000\010\000\000\000" |
||||||
|
} |
||||||
|
} |
||||||
|
} |
||||||
|
} |
||||||
|
node { |
||||||
|
name: "resize_bilinear" |
||||||
|
op: "ResizeBilinear" |
||||||
|
input: "input:0" |
||||||
|
input: "resize_bilinear/size" |
||||||
|
attr { |
||||||
|
key: "T" |
||||||
|
value { |
||||||
|
type: DT_FLOAT |
||||||
|
} |
||||||
|
} |
||||||
|
attr { |
||||||
|
key: "align_corners" |
||||||
|
value { |
||||||
|
b: false |
||||||
|
} |
||||||
|
} |
||||||
|
} |
||||||
|
library { |
||||||
|
} |
||||||
|
``` |
||||||
|
Custom layers import from TensorFlow is designed to put all layer's `attr` into |
||||||
|
cv::dnn::LayerParams but input `Const` blobs into cv::dnn::Layer::blobs. |
||||||
|
In our case resize's output shape will be stored in layer's `blobs[0]`. |
||||||
|
|
||||||
|
@snippet dnn/custom_layers.cpp ResizeBilinearLayer |
||||||
|
|
||||||
|
Next we register a layer and try to import the model. |
||||||
|
|
||||||
|
@snippet dnn/custom_layers.cpp Register ResizeBilinearLayer |
@ -0,0 +1,232 @@ |
|||||||
|
#include <opencv2/dnn.hpp> |
||||||
|
|
||||||
|
//! [A custom layer interface]
|
||||||
|
class MyLayer : public cv::dnn::Layer |
||||||
|
{ |
||||||
|
public: |
||||||
|
//! [MyLayer::MyLayer]
|
||||||
|
MyLayer(const cv::dnn::LayerParams ¶ms); |
||||||
|
//! [MyLayer::MyLayer]
|
||||||
|
|
||||||
|
//! [MyLayer::create]
|
||||||
|
static cv::Ptr<cv::dnn::Layer> create(cv::dnn::LayerParams& params); |
||||||
|
//! [MyLayer::create]
|
||||||
|
|
||||||
|
//! [MyLayer::getMemoryShapes]
|
||||||
|
virtual bool getMemoryShapes(const std::vector<std::vector<int> > &inputs, |
||||||
|
const int requiredOutputs, |
||||||
|
std::vector<std::vector<int> > &outputs, |
||||||
|
std::vector<std::vector<int> > &internals) const; |
||||||
|
//! [MyLayer::getMemoryShapes]
|
||||||
|
|
||||||
|
//! [MyLayer::forward]
|
||||||
|
virtual void forward(std::vector<cv::Mat*> &inputs, std::vector<cv::Mat> &outputs, std::vector<cv::Mat> &internals); |
||||||
|
//! [MyLayer::forward]
|
||||||
|
|
||||||
|
//! [MyLayer::finalize]
|
||||||
|
virtual void finalize(const std::vector<cv::Mat*> &inputs, std::vector<cv::Mat> &outputs); |
||||||
|
//! [MyLayer::finalize]
|
||||||
|
|
||||||
|
virtual void forward(cv::InputArrayOfArrays inputs, cv::OutputArrayOfArrays outputs, cv::OutputArrayOfArrays internals); |
||||||
|
}; |
||||||
|
//! [A custom layer interface]
|
||||||
|
|
||||||
|
//! [InterpLayer]
|
||||||
|
class InterpLayer : public cv::dnn::Layer |
||||||
|
{ |
||||||
|
public: |
||||||
|
InterpLayer(const cv::dnn::LayerParams ¶ms) : Layer(params) |
||||||
|
{ |
||||||
|
outWidth = params.get<int>("width", 0); |
||||||
|
outHeight = params.get<int>("height", 0); |
||||||
|
} |
||||||
|
|
||||||
|
static cv::Ptr<cv::dnn::Layer> create(cv::dnn::LayerParams& params) |
||||||
|
{ |
||||||
|
return cv::Ptr<cv::dnn::Layer>(new InterpLayer(params)); |
||||||
|
} |
||||||
|
|
||||||
|
virtual bool getMemoryShapes(const std::vector<std::vector<int> > &inputs, |
||||||
|
const int requiredOutputs, |
||||||
|
std::vector<std::vector<int> > &outputs, |
||||||
|
std::vector<std::vector<int> > &internals) const |
||||||
|
{ |
||||||
|
CV_UNUSED(requiredOutputs); CV_UNUSED(internals); |
||||||
|
std::vector<int> outShape(4); |
||||||
|
outShape[0] = inputs[0][0]; // batch size
|
||||||
|
outShape[1] = inputs[0][1]; // number of channels
|
||||||
|
outShape[2] = outHeight; |
||||||
|
outShape[3] = outWidth; |
||||||
|
outputs.assign(1, outShape); |
||||||
|
return false; |
||||||
|
} |
||||||
|
|
||||||
|
// Implementation of this custom layer is based on https://github.com/cdmh/deeplab-public/blob/master/src/caffe/layers/interp_layer.cpp
|
||||||
|
virtual void forward(std::vector<cv::Mat*> &inputs, std::vector<cv::Mat> &outputs, std::vector<cv::Mat> &internals) |
||||||
|
{ |
||||||
|
CV_UNUSED(internals); |
||||||
|
cv::Mat& inp = *inputs[0]; |
||||||
|
cv::Mat& out = outputs[0]; |
||||||
|
const float* inpData = (float*)inp.data; |
||||||
|
float* outData = (float*)out.data; |
||||||
|
|
||||||
|
const int batchSize = inp.size[0]; |
||||||
|
const int numChannels = inp.size[1]; |
||||||
|
const int inpHeight = inp.size[2]; |
||||||
|
const int inpWidth = inp.size[3]; |
||||||
|
|
||||||
|
const float rheight = (outHeight > 1) ? static_cast<float>(inpHeight - 1) / (outHeight - 1) : 0.f; |
||||||
|
const float rwidth = (outWidth > 1) ? static_cast<float>(inpWidth - 1) / (outWidth - 1) : 0.f; |
||||||
|
for (int h2 = 0; h2 < outHeight; ++h2) |
||||||
|
{ |
||||||
|
const float h1r = rheight * h2; |
||||||
|
const int h1 = static_cast<int>(h1r); |
||||||
|
const int h1p = (h1 < inpHeight - 1) ? 1 : 0; |
||||||
|
const float h1lambda = h1r - h1; |
||||||
|
const float h0lambda = 1.f - h1lambda; |
||||||
|
for (int w2 = 0; w2 < outWidth; ++w2) |
||||||
|
{ |
||||||
|
const float w1r = rwidth * w2; |
||||||
|
const int w1 = static_cast<int>(w1r); |
||||||
|
const int w1p = (w1 < inpWidth - 1) ? 1 : 0; |
||||||
|
const float w1lambda = w1r - w1; |
||||||
|
const float w0lambda = 1.f - w1lambda; |
||||||
|
const float* pos1 = inpData + h1 * inpWidth + w1; |
||||||
|
float* pos2 = outData + h2 * outWidth + w2; |
||||||
|
for (int c = 0; c < batchSize * numChannels; ++c) |
||||||
|
{ |
||||||
|
pos2[0] = |
||||||
|
h0lambda * (w0lambda * pos1[0] + w1lambda * pos1[w1p]) + |
||||||
|
h1lambda * (w0lambda * pos1[h1p * inpWidth] + w1lambda * pos1[h1p * inpWidth + w1p]); |
||||||
|
pos1 += inpWidth * inpHeight; |
||||||
|
pos2 += outWidth * outHeight; |
||||||
|
} |
||||||
|
} |
||||||
|
} |
||||||
|
} |
||||||
|
|
||||||
|
virtual void forward(cv::InputArrayOfArrays, cv::OutputArrayOfArrays, cv::OutputArrayOfArrays) {} |
||||||
|
|
||||||
|
private: |
||||||
|
int outWidth, outHeight; |
||||||
|
}; |
||||||
|
//! [InterpLayer]
|
||||||
|
|
||||||
|
//! [ResizeBilinearLayer]
|
||||||
|
class ResizeBilinearLayer : public cv::dnn::Layer |
||||||
|
{ |
||||||
|
public: |
||||||
|
ResizeBilinearLayer(const cv::dnn::LayerParams ¶ms) : Layer(params) |
||||||
|
{ |
||||||
|
CV_Assert(!params.get<bool>("align_corners", false)); |
||||||
|
CV_Assert(blobs.size() == 1, blobs[0].type() == CV_32SC1); |
||||||
|
outHeight = blobs[0].at<int>(0, 0); |
||||||
|
outWidth = blobs[0].at<int>(0, 1); |
||||||
|
} |
||||||
|
|
||||||
|
static cv::Ptr<cv::dnn::Layer> create(cv::dnn::LayerParams& params) |
||||||
|
{ |
||||||
|
return cv::Ptr<cv::dnn::Layer>(new ResizeBilinearLayer(params)); |
||||||
|
} |
||||||
|
|
||||||
|
virtual bool getMemoryShapes(const std::vector<std::vector<int> > &inputs, |
||||||
|
const int requiredOutputs, |
||||||
|
std::vector<std::vector<int> > &outputs, |
||||||
|
std::vector<std::vector<int> > &internals) const |
||||||
|
{ |
||||||
|
CV_UNUSED(requiredOutputs); CV_UNUSED(internals); |
||||||
|
std::vector<int> outShape(4); |
||||||
|
outShape[0] = inputs[0][0]; // batch size
|
||||||
|
outShape[1] = inputs[0][1]; // number of channels
|
||||||
|
outShape[2] = outHeight; |
||||||
|
outShape[3] = outWidth; |
||||||
|
outputs.assign(1, outShape); |
||||||
|
return false; |
||||||
|
} |
||||||
|
|
||||||
|
// This implementation is based on a reference implementation from
|
||||||
|
// https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/kernels/internal/reference/reference_ops.h
|
||||||
|
virtual void forward(std::vector<cv::Mat*> &inputs, std::vector<cv::Mat> &outputs, std::vector<cv::Mat> &internals) |
||||||
|
{ |
||||||
|
CV_UNUSED(internals); |
||||||
|
cv::Mat& inp = *inputs[0]; |
||||||
|
cv::Mat& out = outputs[0]; |
||||||
|
const float* inpData = (float*)inp.data; |
||||||
|
float* outData = (float*)out.data; |
||||||
|
|
||||||
|
const int batchSize = inp.size[0]; |
||||||
|
const int numChannels = inp.size[1]; |
||||||
|
const int inpHeight = inp.size[2]; |
||||||
|
const int inpWidth = inp.size[3]; |
||||||
|
|
||||||
|
float heightScale = static_cast<float>(inpHeight) / outHeight; |
||||||
|
float widthScale = static_cast<float>(inpWidth) / outWidth; |
||||||
|
for (int b = 0; b < batchSize; ++b) |
||||||
|
{ |
||||||
|
for (int y = 0; y < outHeight; ++y) |
||||||
|
{ |
||||||
|
float input_y = y * heightScale; |
||||||
|
int y0 = static_cast<int>(std::floor(input_y)); |
||||||
|
int y1 = std::min(y0 + 1, inpHeight - 1); |
||||||
|
for (int x = 0; x < outWidth; ++x) |
||||||
|
{ |
||||||
|
float input_x = x * widthScale; |
||||||
|
int x0 = static_cast<int>(std::floor(input_x)); |
||||||
|
int x1 = std::min(x0 + 1, inpWidth - 1); |
||||||
|
for (int c = 0; c < numChannels; ++c) |
||||||
|
{ |
||||||
|
float interpolation = |
||||||
|
inpData[offset(inp.size, c, x0, y0, b)] * (1 - (input_y - y0)) * (1 - (input_x - x0)) + |
||||||
|
inpData[offset(inp.size, c, x0, y1, b)] * (input_y - y0) * (1 - (input_x - x0)) + |
||||||
|
inpData[offset(inp.size, c, x1, y0, b)] * (1 - (input_y - y0)) * (input_x - x0) + |
||||||
|
inpData[offset(inp.size, c, x1, y1, b)] * (input_y - y0) * (input_x - x0); |
||||||
|
outData[offset(out.size, c, x, y, b)] = interpolation; |
||||||
|
} |
||||||
|
} |
||||||
|
} |
||||||
|
} |
||||||
|
} |
||||||
|
|
||||||
|
virtual void forward(cv::InputArrayOfArrays, cv::OutputArrayOfArrays, cv::OutputArrayOfArrays) {} |
||||||
|
|
||||||
|
private: |
||||||
|
static inline int offset(const cv::MatSize& size, int c, int x, int y, int b) |
||||||
|
{ |
||||||
|
return x + size[3] * (y + size[2] * (c + size[1] * b)); |
||||||
|
} |
||||||
|
|
||||||
|
int outWidth, outHeight; |
||||||
|
}; |
||||||
|
//! [ResizeBilinearLayer]
|
||||||
|
|
||||||
|
//! [Register a custom layer]
|
||||||
|
#include <opencv2/dnn/layer.details.hpp> // CV_DNN_REGISTER_LAYER_CLASS macro |
||||||
|
|
||||||
|
int main(int argc, char** argv) |
||||||
|
{ |
||||||
|
CV_DNN_REGISTER_LAYER_CLASS(MyType, MyLayer); |
||||||
|
// ...
|
||||||
|
//! [Register a custom layer]
|
||||||
|
CV_UNUSED(argc); CV_UNUSED(argv); |
||||||
|
//! [Register InterpLayer]
|
||||||
|
CV_DNN_REGISTER_LAYER_CLASS(Interp, InterpLayer); |
||||||
|
cv::dnn::Net caffeNet = cv::dnn::readNet("/path/to/config.prototxt", "/path/to/weights.caffemodel"); |
||||||
|
//! [Register InterpLayer]
|
||||||
|
|
||||||
|
//! [Register ResizeBilinearLayer]
|
||||||
|
CV_DNN_REGISTER_LAYER_CLASS(ResizeBilinear, ResizeBilinearLayer); |
||||||
|
cv::dnn::Net tfNet = cv::dnn::readNet("/path/to/graph.pb"); |
||||||
|
//! [Register ResizeBilinearLayer]
|
||||||
|
} |
||||||
|
|
||||||
|
cv::Ptr<cv::dnn::Layer> MyLayer::create(cv::dnn::LayerParams& params) |
||||||
|
{ |
||||||
|
return cv::Ptr<cv::dnn::Layer>(new MyLayer(params)); |
||||||
|
} |
||||||
|
MyLayer::MyLayer(const cv::dnn::LayerParams&) {} |
||||||
|
bool MyLayer::getMemoryShapes(const std::vector<std::vector<int> >&, const int, |
||||||
|
std::vector<std::vector<int> >&, |
||||||
|
std::vector<std::vector<int> >&) const { return false; } |
||||||
|
void MyLayer::forward(std::vector<cv::Mat*>&, std::vector<cv::Mat>&, std::vector<cv::Mat>&) {} |
||||||
|
void MyLayer::finalize(const std::vector<cv::Mat*>&, std::vector<cv::Mat>&) {} |
||||||
|
void MyLayer::forward(cv::InputArrayOfArrays, cv::OutputArrayOfArrays, cv::OutputArrayOfArrays) {} |
Loading…
Reference in new issue