Layers¶
Layer is a core abstraction in SINGA. It performs a variety of feature transformations for extracting high-level features, e.g., loading raw features, parsing RGB values, doing convolution transformation, etc.
The Basic user guide section introduces the configuration of a built-in layer. Advanced user guide explains how to extend the base Layer class to implement users’ functions.
Basic user guide¶
Layer configuration¶
Configuration of two example layers are shown below,
layer {
name: "data"
type: kCSVRecord
store_conf { }
}
layer{
name: "fc1"
type: kInnerProduct
srclayers: "data"
innerproduct_conf{ }
param{ }
}
There are some common fields for all kinds of layers:
name
: a string used to differentiate two layers in a neural net.type
: an integer used for identifying a specific Layer subclass. The types of built-in layers are listed in LayerType (defined in job.proto). For user-defined layer subclasses,user_type
should be used instead oftype
.srclayers
: names of the source layers. In SINGA, all connections are converted to directed connections.param
: configuration for a Param instance. There can be multiple Param objects in one layer.
Different layers may have different configurations. These configurations
are defined in <type>_conf
. E.g., “fc1” layer has
innerproduct_conf
. The subsequent sections
explain the functionality of each built-in layer and how to configure it.
Built-in Layer subclasses¶
SINGA has provided many built-in layers, which can be used directly to create neural nets. These layers are categorized according to their functionalities,
- Input layers for loading records (e.g., images) from disk files, HDFS or network into memory.
- Neuron layers for feature transformation, e.g., convolution, pooling, dropout, etc.
- Loss layers for measuring the training objective loss, e.g., Cross Entropy loss or Euclidean loss.
- Output layers for outputting the prediction results (e.g., probabilities of each category) or features into persistent storage, e.g., disk or HDFS.
- Connection layers for connecting layers when the neural net is partitioned.
Input layers¶
Input layers load training/test data from disk or other places (e.g., HDFS or network) into memory.
StoreInputLayer¶
StoreInputLayer is a base layer for
loading data from data store. The data store can be a KVFile or TextFile (LMDB,
LevelDB, HDFS, etc., will be supported later). Its ComputeFeature
function reads
batchsize (string:key, string:value) tuples. Each tuple is parsed by a Parse
function
implemented by its subclasses.
The configuration for this layer is in store_conf
,
store_conf {
backend: # "kvfile" or "textfile"
path: # path to the data store
batchsize : 32
prefetching: true #default value is false
...
}
SingleLabelRecordLayer¶
It is a subclass of StoreInputLayer. It assumes the (key, value) tuple loaded
from a data store contains a feature vector (and a label) for one data instance.
All feature vectors are of the same fixed length. The shape of one instance
is configured through the shape
field, e.g., the following configuration
specifies the shape for the CIFAR10 images.
store_conf {
shape: 3 #channels
shape: 32 #height
shape: 32 #width
}
It may do some preprocessing like standardization. The data for preprocessing is loaded by and parsed in a virtual function, which is implemented by its subclasses.
RecordInputLayer¶
It is a subclass of SingleLabelRecordLayer. It parses the value field from one tuple into a RecordProto, which is generated by Google Protobuf according to common.proto. It can be used to store features for images (e.g., using the pixel field) or other objects (using the data field). The key field is not parsed.
type: kRecordInput
store_conf {
has_label: # default is true
...
}
CSVInputLayer¶
It is a subclass of SingleLabelRecordLayer. The value field from one tuple is parsed
as a CSV line (separated by comma). The first number would be parsed as a label if
has_label
is configured in store_conf
. Otherwise, all numbers would be parsed
into one row of the data_
Blob.
type: kCSVInput
store_conf {
has_label: # default is true
...
}
ImagePreprocessLayer¶
This layer does image preprocessing, e.g., cropping, mirroring and scaling, against the data Blob from its source layer. It deprecates the RGBImageLayer which works on the Record from ShardDataLayer. It still uses the same configuration as RGBImageLayer,
type: kImagePreprocess
rgbimage_conf {
scale: float
cropsize: int # cropping each image to keep the central part with this size
mirror: bool # mirror the image by set image[i,j]=image[i,len-j]
meanfile: "Image_Mean_File_Path"
}
ShardDataLayer (Deprected)¶
Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer.
ShardDataLayer is a subclass of DataLayer, which reads Records from disk file. The file should be created using DataShard class. With the data file prepared, users configure the layer as
type: kShardData
sharddata_conf {
path: "path to data shard folder"
batchsize: int
random_skip: int
}
batchsize
specifies the number of records to be trained for one mini-batch.
The first rand() % random_skip
Record
s will be skipped at the first
iteration. This is to enforce that different workers work on different Records.
LMDBDataLayer (Deprected)¶
Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer.
[LMDBDataLayer] is similar to ShardDataLayer, except that the Records are loaded from LMDB.
type: kLMDBData
lmdbdata_conf {
path: "path to LMDB folder"
batchsize: int
random_skip: int
}
ParserLayer (Deprected)¶
Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer.
It get a vector of Records from DataLayer and parse features into a Blob.
virtual void ParseRecords(Phase phase, const vector<Record>& records, Blob<float>* blob) = 0;
LabelLayer (Deprected)¶
Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer.
LabelLayer is a subclass of ParserLayer. It parses a single label from each Record. Consequently, it will put $b$ (mini-batch size) values into the Blob. It has no specific configuration fields.
MnistImageLayer (Deprected)¶
Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer.
[MnistImageLayer] is a subclass of ParserLayer. It parses the pixel values of
each image from the MNIST dataset. The pixel
values may be normalized as x/norm_a - norm_b
. For example, if norm_a
is
set to 255 and norm_b
is set to 0, then every pixel will be normalized into
[0, 1].
type: kMnistImage
mnistimage_conf {
norm_a: float
norm_b: float
}
RGBImageLayer (Deprected)¶
Deprected! Please use the ImagePreprocessLayer.
RGBImageLayer is a subclass of ParserLayer.
It parses the RGB values of one image from each Record. It may also
apply some transformations, e.g., cropping, mirroring operations. If the
meanfile
is specified, it should point to a path that contains one Record for
the mean of each pixel over all training images.
type: kRGBImage
rgbimage_conf {
scale: float
cropsize: int # cropping each image to keep the central part with this size
mirror: bool # mirror the image by set image[i,j]=image[i,len-j]
meanfile: "Image_Mean_File_Path"
}
PrefetchLayer¶
PrefetchLayer embeds other input layers to do data prefeching. It will launch a thread to call the embedded layers to load and extract features. It ensures that the I/O task and computation task can work simultaneously. One example PrefetchLayer configuration is,
layer {
name: "prefetch"
type: kPrefetch
sublayers {
name: "data"
type: kShardData
sharddata_conf { }
}
sublayers {
name: "rgb"
type: kRGBImage
srclayers:"data"
rgbimage_conf { }
}
sublayers {
name: "label"
type: kLabel
srclayers: "data"
}
exclude:kTest
}
The layers on top of the PrefetchLayer should use the name of the embedded
layers as their source layers. For example, the “rgb” and “label” should be
configured to the srclayers
of other layers.
Output Layers¶
Output layers get data from their source layers and write them to persistent storage, e.g., disk files or HDFS (to be supported).
RecordOutputLayer¶
This layer gets data (and label if it is available) from its source layer and converts it into records of type
RecordProto. Records are written as (key = instance No., value = serialized record) tuples into Store, e.g., KVFile. The configuration of this layer
should include the specifics of the Store backend via store_conf
.
layer {
name: "output"
type: kRecordOutput
srclayers:
store_conf {
backend: "kvfile"
path:
}
}
CSVOutputLayer¶
This layer gets data (and label if it available) from its source layer and converts it into
a string per instance with fields separated by commas (i.e., CSV format). The shape information
is not kept in the string. All strings are written into
Store, e.g., text file. The configuration of this layer should include the specifics of the Store backend via store_conf
.
layer {
name: "output"
type: kCSVOutput
srclayers:
store_conf {
backend: "textfile"
path:
}
}
Neuron Layers¶
Neuron layers conduct feature transformations.
ActivationLayer¶
type: kActivation
activation_conf {
type: {RELU, SIGMOID, TANH, STANH}
}
ConvolutionLayer¶
ConvolutionLayer conducts convolution transformation.
type: kConvolution
convolution_conf {
num_filters: int
kernel: int
stride: int
pad: int
}
param { } # weight/filter matrix
param { } # bias vector
The int value num_filters
stands for the count of the applied filters; the int
value kernel
stands for the convolution kernel size (equal width and height);
the int value stride
stands for the distance between the successive filters;
the int value pad
pads each with a given int number of pixels border of
zeros.
InnerProductLayer¶
InnerProductLayer is fully connected with its (single) source layer. Typically, it has two parameter fields, one for weight matrix, and the other for bias vector. It rotates the feature of the source layer (by multiplying with weight matrix) and shifts it (by adding the bias vector).
type: kInnerProduct
innerproduct_conf {
num_output: int
}
param { } # weight matrix
param { } # bias vector
PoolingLayer¶
PoolingLayer is used to do a normalization (or averaging or sampling) of the feature vectors from the source layer.
type: kPooling
pooling_conf {
pool: AVE|MAX // Choose whether use the Average Pooling or Max Pooling
kernel: int // size of the kernel filter
pad: int // the padding size
stride: int // the step length of the filter
}
The pooling layer has two methods: Average Pooling and Max Pooling. Use the enum AVE and MAX to choose the method.
- Max Pooling selects the max value for each filtering area as a point of the result feature blob.
- Average Pooling averages all values for each filtering area at a point of the result feature blob.
ReLULayer¶
ReLuLayer has rectified linear neurons, which conducts the following
transformation, f(x) = Max(0, x)
. It has no specific configuration fields.
STanhLayer¶
STanhLayer uses the scaled tanh as activation function, i.e., f(x)=1.7159047* tanh(0.6666667 * x)
.
It has no specific configuration fields.
SigmoidLayer¶
[SigmoidLayer] uses the sigmoid (or logistic) as activation function, i.e.,
f(x)=sigmoid(x)
. It has no specific configuration fields.
Dropout Layer¶
DropoutLayer is a layer that randomly dropouts some inputs. This scheme helps deep learning model away from over-fitting.
type: kDropout
dropout_conf {
dropout_ratio: float # dropout probability
}
LRNLayer¶
LRNLayer, (Local Response Normalization), normalizes over the channels.
type: kLRN
lrn_conf {
local_size: int
alpha: float // scaling parameter
beta: float // exponential number
}
local_size
specifies the quantity of the adjoining channels which will be summed up.
For WITHIN_CHANNEL
, it means the side length of the space region which will be summed up.
CuDNN layers¶
CuDNN v3 and v4 are supported in SINGA, which include the following layers,
- CudnnActivationLayer (activation functions are SIGMOID, TANH, RELU)
- CudnnConvLayer
- CudnnLRNLayer
- CudnnPoolLayer
- CudnnSoftmaxLayer
These layers have the same configuration as the corresponding CPU layers.
For CuDNN v4, the batch normalization layer is added, which is named as
CudnnBMLayer
.
Loss Layers¶
Loss layers measures the objective training loss.
SoftmaxLossLayer¶
SoftmaxLossLayer is a combination of the Softmax transformation and Cross-Entropy loss. It applies Softmax firstly to get a prediction probability for each output unit (neuron) and compute the cross-entropy against the ground truth. It is generally used as the final layer to generate labels for classification tasks.
type: kSoftmaxLoss
softmaxloss_conf {
topk: int
}
The configuration field topk
is for selecting the labels with topk
probabilities as the prediction results. It is tedious for users to view the
prediction probability of every label.
ConnectionLayer¶
Subclasses of ConnectionLayer are utility layers that connects other layers due to neural net partitioning or other cases.
ConcateLayer¶
ConcateLayer connects more than one source layers to concatenate their feature blob along given dimension.
type: kConcate
concate_conf {
concate_dim: int // define the dimension
}
SliceLayer¶
SliceLayer connects to more than one destination layers to slice its feature blob along given dimension.
type: kSlice
slice_conf {
slice_dim: int
}
SplitLayer¶
SplitLayer connects to more than one destination layers to replicate its feature blob.
type: kSplit
split_conf {
num_splits: int
}
BridgeSrcLayer & BridgeDstLayer¶
BridgeSrcLayer & BridgeDstLayer are utility layers assisting data (e.g., feature or gradient) transferring due to neural net partitioning. These two layers are added implicitly. Users typically do not need to configure them in their neural net configuration.
OutputLayer¶
It write the prediction results or the extracted features into file, HTTP stream or other places. Currently SINGA has not implemented any specific output layer.
Advanced user guide¶
The base Layer class is introduced in this section, followed by how to implement a new Layer subclass.
Base Layer class¶
Members¶
LayerProto layer_conf_;
vector<Blob<float>> datavec_, gradvec_;
vector<AuxType> aux_data_;
The base layer class keeps the user configuration in layer_conf_
.
datavec_
stores the features associated with this layer.
There are layers without feature vectors; instead, they share the data from
source layers.
The gradvec_
is for storing the gradients of the
objective loss w.r.t. the datavec_
. The aux_data_
stores the auxiliary data, e.g., image label (set AuxType
to int).
If images have variant number of labels, the AuxType can be defined to vector<int>
.
Currently, we hard code AuxType
to int. It will be added as a template argument of Layer class later.
If a layer has parameters, these parameters are declared using type
Param. Since some layers do not have
parameters, we do not declare any Param
in the base layer class.
Functions¶
virtual void Setup(const LayerProto& conf, const vector<Layer*>& srclayers);
virtual void ComputeFeature(int flag, const vector<Layer*>& srclayers) = 0;
virtual void ComputeGradient(int flag, const vector<Layer*>& srclayers) = 0;
The Setup
function reads user configuration, i.e. conf
, and information
from source layers, e.g., mini-batch size, to set the
shape of the data_
(and grad_
) field as well
as some other layer specific fields.
Memory will not be allocated until computation over the data structure happens.
The ComputeFeature
function evaluates the feature blob by transforming (e.g.
convolution and pooling) features from the source layers. ComputeGradient
computes the gradients of parameters associated with this layer. These two
functions are invoked by the TrainOneBatch
function during training. Hence, they should be consistent with the
TrainOneBatch
function. Particularly, for feed-forward and RNN models, they are
trained using BP algorithm,
which requires each layer’s ComputeFeature
function to compute data_
based on source layers, and requires each layer’s
ComputeGradient
to compute gradients of parameters and source layers’
grad_
. For energy models, e.g., RBM, they are trained by
CD algorithm, which
requires each layer’s ComputeFeature
function to compute the feature vectors
for the positive phase or negative phase depending on the phase
argument, and
requires the ComputeGradient
function to only compute parameter gradients.
For some layers, e.g., loss layer or output layer, they can put the loss or
prediction result into the metric
argument, which will be averaged and
displayed periodically.
Implementing a new Layer subclass¶
Users can extend the Layer class or other subclasses to implement their own feature transformation
logics as long as the two virtual functions are overridden to be consistent with
the TrainOneBatch
function. The Setup
function may also be overridden to
read specific layer configuration.
The RNNLM provides a couple of user-defined layers. You can refer to them as examples.
Layer specific protocol message¶
To implement a new layer, the first step is to define the layer specific
configuration. Suppose the new layer is FooLayer
, the layer specific
google protocol message FooLayerProto
should be defined as
# in user.proto
package singa
import "job.proto"
message FooLayerProto {
optional int32 a = 1; // specific fields to the FooLayer
}
In addition, users need to extend the original LayerProto
(defined in job.proto of SINGA)
to include the foo_conf
as follows.
extend LayerProto {
optional FooLayerProto foo_conf = 101; // unique field id, reserved for extensions
}
If there are multiple new layers, then each layer that has specific
configurations would have a <type>_conf
field and takes one unique extension number.
SINGA has reserved enough extension numbers, e.g., starting from 101 to 1000.
# job.proto of SINGA
LayerProto {
...
extensions 101 to 1000;
}
With user.proto defined, users can use
protoc to generate the user.pb.cc
and user.pb.h
files. In users’ code, the extension fields can be accessed via,
auto conf = layer_proto_.GetExtension(foo_conf);
int a = conf.a();
When defining configurations of the new layer (in job.conf), users should use
user_type
for its layer type instead of type
. In addition, foo_conf
should be enclosed in brackets.
layer {
name: "foo"
user_type: "kFooLayer" # Note user_type of user-defined layers is string
[foo_conf] { # Note there is a pair of [] for extension fields
a: 10
}
}
New Layer subclass declaration¶
The new layer subclass can be implemented like the built-in layer subclasses.
class FooLayer : public singa::Layer {
public:
void Setup(const LayerProto& conf, const vector<Layer*>& srclayers) override;
void ComputeFeature(int flag, const vector<Layer*>& srclayers) override;
void ComputeGradient(int flag, const vector<Layer*>& srclayers) override;
private:
// members
};
Users must override the two virtual functions to be called by the
TrainOneBatch
for either BP or CD algorithm. Typically, the Setup
function
will also be overridden to initialize some members. The user configured fields
can be accessed through layer_conf_
as shown in the above paragraphs.
New Layer subclass registration¶
The newly defined layer should be registered in main.cc by adding
driver.RegisterLayer<FooLayer, std::string>("kFooLayer"); // "kFooLayer" should be matched to layer configurations in job.conf.
After that, the NeuralNet can create instances of the new Layer subclass.