SINGA supports different parallelism options for distributed training. Users just need to configure it in the job configuration.
Both NetProto and LayerProto have a field partition_dim to control the parallelism option:
partition_dim field in NetProto will be applied to all layers, unless a layer has its own partition_dim field set.
If we want data parallelism for the whole model, just leave partition_dim as default (which is 0), or configure the job.conf like:
neuralnet { partition_dim: 0 layer { name: ... type: ... } ... }
With the hybrid parallelism, we can have layers either partitioned on data dimension or feature dimension. For example, if we want a specific layer partitioned on feature dimension, just configure like:
neuralnet { partition_dim: 0 layer { name: "layer1_partition_on_data_dimension" type: ... } layer { name: "layer2_partition_on_feature_dimension" type: ... partition_dim: 1 } ... }
To support hybrid parallelism, after singa read users’ model and paration configuration, a set of connection layers are automatically added between layers when needed:
BridgeSrcLayer & BridgeDstLayer are added when two connected layers are not in the same machine. They are paired and are responsible for sending data/gradient to the other side during each iteration.
ConcateLayer is added when there are multiple source layers. It combines their feature blobs along a given dimension.
SliceLayer is added when there are mutliple dest layers, each of which only needs a subset(slice) of this layers’ feature blob.
SplitLayer is added when there are multiple dest layers, each of which needs the whole feature blob.
Following is the logic used in our code to add connection layers:
Add Slice, Concate, Split Layers for Hybrid Partition All cases are as follows: src_pdim | dst_pdim | connection_type | Action 0 | 0 | OneToOne | Direct Connection 1 | 1 | OneToOne | Direct Connection 0 | 0 | OneToAll | Direct Connection 1 | 0 | OneToOne | Slice -> Concate 0 | 1 | OneToOne | Slice -> Concate 1 | 0 | OneToAll | Slice -> Concate 0 | 1 | OneToAll | Split -> Concate 1 | 1 | OneToAll | Split -> Concate Logic: dst_pdim = 1 && OneToAll ? (YES) Split -> Concate (NO) src_pdim = dst_pdim ? (YES) Direct Connection (NO) Slice -> Concate