Hybrid Parallelism


User Guide

SINGA supports different parallelism options for distributed training. Users just need to configure it in the job configuration.

Both NetProto and LayerProto have a field partition_dim to control the parallelism option:

  • partition_dim=0: neuralnet/layer is partitioned on data dimension, i.e., each worker processes a subset of data records.
  • partition_dim=1: neuralnet/layer is partitioned on feature dimension, i.e., each worker maintains a subset of feature parameters.

partition_dim field in NetProto will be applied to all layers, unless a layer has its own partition_dim field set.

If we want data parallelism for the whole model, just leave partition_dim as default (which is 0), or configure the job.conf like:

neuralnet {
  partition_dim: 0
  layer {
    name: ... 
    type: ...
  }
  ...
}

With the hybrid parallelism, we can have layers either partitioned on data dimension or feature dimension. For example, if we want a specific layer partitioned on feature dimension, just configure like:

neuralnet {
  partition_dim: 0
  layer {
    name: "layer1_partition_on_data_dimension"
    type: ...
  }
  layer {
    name: "layer2_partition_on_feature_dimension"
    type: ...
    partition_dim: 1
  }
  ...
}

Developer Guide

To support hybrid parallelism, after singa read users’ model and paration configuration, a set of connection layers are automatically added between layers when needed:

  • BridgeSrcLayer & BridgeDstLayer are added when two connected layers are not in the same machine. They are paired and are responsible for sending data/gradient to the other side during each iteration.
  • ConcateLayer is added when there are multiple source layers. It combines their feature blobs along a given dimension.
  • SliceLayer is added when there are mutliple dest layers, each of which only needs a subset(slice) of this layers’ feature blob.
  • SplitLayer is added when there are multiple dest layers, each of which needs the whole feature blob.

Following is the logic used in our code to add connection layers:

Add Slice, Concate, Split Layers for Hybrid Partition

All cases are as follows:
src_pdim | dst_pdim | connection_type | Action
    0    |     0    |     OneToOne    | Direct Connection
    1    |     1    |     OneToOne    | Direct Connection
    0    |     0    |     OneToAll    | Direct Connection
    1    |     0    |     OneToOne    | Slice -> Concate
    0    |     1    |     OneToOne    | Slice -> Concate
    1    |     0    |     OneToAll    | Slice -> Concate
    0    |     1    |     OneToAll    | Split -> Concate
    1    |     1    |     OneToAll    | Split -> Concate

Logic:
dst_pdim = 1 && OneToAll ?
  (YES) Split -> Concate
  (NO)  src_pdim = dst_pdim ?
          (YES) Direct Connection
          (NO)  Slice -> Concate