# RBM Example

---

This example uses SINGA to train 4 RBM models and one auto-encoder model over the
[MNIST dataset](http://yann.lecun.com/exdb/mnist/). The auto-encoder model is trained
to reduce the dimensionality of the MNIST image feature. The RBM models are trained
to initialize parameters of the auto-encoder model. This example application is
from [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf).

## Running instructions

Running scripts are provided in *SINGA_ROOT/examples/rbm* folder.

The MNIST dataset has 70,000 handwritten digit images. The
[data preparation](data.html) page
has details on converting this dataset into SINGA recognizable format. Users can
simply run the following commands to download and convert the dataset.

    # at SINGA_ROOT/examples/mnist/
    $ cp Makefile.example Makefile
    $ make download
    $ make create

The training is separated into two phases, namely pre-training and fine-tuning.
The pre-training phase trains 4 RBMs in sequence,

    # at SINGA_ROOT/
    $ ./bin/singa-run.sh -conf examples/rbm/rbm1.conf
    $ ./bin/singa-run.sh -conf examples/rbm/rbm2.conf
    $ ./bin/singa-run.sh -conf examples/rbm/rbm3.conf
    $ ./bin/singa-run.sh -conf examples/rbm/rbm4.conf

The fine-tuning phase trains the auto-encoder by,

    $ ./bin/singa-run.sh -conf examples/rbm/autoencoder.conf


## Training details

### RBM1

<img src="../_static/images/example-rbm1.png" align="center" width="200px"/>
<span><strong>Figure 1 - RBM1.</strong></span>

The neural net structure for training RBM1 is shown in Figure 1.
The data layer and parser layer provides features for training RBM1.
The visible layer (connected with parser layer) of RBM1 accepts the image feature
(784 dimension). The hidden layer is set to have 1000 neurons (units).
These two layers are configured as,

    layer{
      name: "RBMVis"
      type: kRBMVis
      srclayers:"mnist"
      srclayers:"RBMHid"
      rbm_conf{
        hdim: 1000
      }
      param{
        name: "w1"
        init{
          type: kGaussian
          mean: 0.0
          std: 0.1
        }
      }
      param{
        name: "b11"
        init{
          type: kConstant
          value: 0.0
        }
      }
    }

    layer{
      name: "RBMHid"
      type: kRBMHid
      srclayers:"RBMVis"
      rbm_conf{
        hdim: 1000
      }
      param{
        name: "w1_"
        share_from: "w1"
      }
      param{
        name: "b12"
        init{
          type: kConstant
          value: 0.0
        }
      }
    }


For RBM, the weight matrix is shared by the visible and hidden layers. For instance,
`w1` is shared by `vis` and `hid` layers shown in Figure 1. In SINGA, we can configure
the `share_from` field to enable [parameter sharing](param.html)
as shown above for the param `w1` and `w1_`.

[Contrastive Divergence](train-one-batch.html#contrastive-divergence)
is configured as the algorithm for [TrainOneBatch](train-one-batch.html).
Following Hinton's paper, we configure the [updating protocol](updater.html)
as follows,

    # Updater Configuration
    updater{
      type: kSGD
      momentum: 0.2
      weight_decay: 0.0002
      learning_rate{
        base_lr: 0.1
        type: kFixed
      }
    }

Since the parameters of RBM0 will be used to initialize the auto-encoder, we should
configure the `workspace` field to specify a path for the checkpoint folder.
For example, if we configure it as,

    cluster {
      workspace: "examples/rbm/rbm1/"
    }

Then SINGA will [checkpoint the parameters](checkpoint.html) into *examples/rbm/rbm1/*.

### RBM1
<img src="../_static/images/example-rbm2.png" align="center" width="200px"/>
<span><strong>Figure 2 - RBM2.</strong></span>

Figure 2 shows the net structure of training RBM2.
The visible units of RBM2 accept the output from the Sigmoid1 layer. The Inner1 layer
is a  `InnerProductLayer` whose parameters are set to the `w1` and `b12` learned
from RBM1.
The neural net configuration is (with layers for data layer and parser layer omitted).

    layer{
      name: "Inner1"
      type: kInnerProduct
      srclayers:"mnist"
      innerproduct_conf{
        num_output: 1000
      }
      param{ name: "w1" }
      param{ name: "b12"}
    }

    layer{
      name: "Sigmoid1"
      type: kSigmoid
      srclayers:"Inner1"
    }

    layer{
      name: "RBMVis"
      type: kRBMVis
      srclayers:"Sigmoid1"
      srclayers:"RBMHid"
      rbm_conf{
        hdim: 500
      }
      param{
        name: "w2"
        ...
      }
      param{
        name: "b21"
        ...
      }
    }

    layer{
      name: "RBMHid"
      type: kRBMHid
      srclayers:"RBMVis"
      rbm_conf{
        hdim: 500
      }
      param{
        name: "w2_"
        share_from: "w2"
      }
      param{
        name: "b22"
        ...
      }
    }

To load w0 and b02 from RBM0's checkpoint file, we configure the `checkpoint_path` as,

    checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"
    cluster{
      workspace: "examples/rbm/rbm2"
    }

The workspace is changed for checkpointing `w2`, `b21` and `b22` into
*examples/rbm/rbm2/*.

### RBM3

<img src="../_static/images/example-rbm3.png" align="center" width="200px"/>
<span><strong>Figure 3 - RBM3.</strong></span>

Figure 3 shows the net structure of training RBM3. In this model, a layer with
250 units is added as the hidden layer of RBM3. The visible units of RBM3
accepts output from Sigmoid2 layer. Parameters of Inner1 and Innner2 are set to
`w1,b12,w2,b22` which can be load from the checkpoint file of RBM2,
i.e., "examples/rbm/rbm2/".

### RBM4


<img src="../_static/images/example-rbm4.png" align="center" width="200px"/>
<span><strong>Figure 4 - RBM4.</strong></span>

Figure 4 shows the net structure of training RBM4. It is similar to Figure 3,
but according to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf), the hidden units of the
top RBM (RBM4) have stochastic real-valued states drawn from a unit variance
Gaussian whose mean is determined by the input from the RBM's logistic visible
units. So we add a `gaussian` field in the RBMHid layer to control the
sampling distribution (Gaussian or Bernoulli). In addition, this
RBM has a much smaller learning rate (0.001).  The neural net configuration for
the RBM4 and the updating protocol is (with layers for data layer and parser
layer omitted),

    # Updater Configuration
    updater{
      type: kSGD
      momentum: 0.9
      weight_decay: 0.0002
      learning_rate{
        base_lr: 0.001
        type: kFixed
      }
    }

    layer{
      name: "RBMVis"
      type: kRBMVis
      srclayers:"Sigmoid3"
      srclayers:"RBMHid"
      rbm_conf{
        hdim: 30
      }
      param{
        name: "w4"
        ...
      }
      param{
        name: "b41"
        ...
      }
    }

    layer{
      name: "RBMHid"
      type: kRBMHid
      srclayers:"RBMVis"
      rbm_conf{
        hdim: 30
        gaussian: true
      }
      param{
        name: "w4_"
        share_from: "w4"
      }
      param{
        name: "b42"
        ...
      }
    }

### Auto-encoder
In the fine-tuning stage, the 4 RBMs are "unfolded" to form encoder and decoder
networks that are initialized using the parameters from the previous 4 RBMs.

<img src="../_static/images/example-autoencoder.png" align="center" width="500px"/>
<span><strong>Figure 5 - Auto-Encoders.</strong></span>


Figure 5 shows the neural net structure for training the auto-encoder.
[Back propagation (kBP)] (train-one-batch.html) is
configured as the algorithm for `TrainOneBatch`. We use the same cluster
configuration as RBM models. For updater, we use [AdaGrad](updater.html#adagradupdater) algorithm with
fixed learning rate.

    ### Updater Configuration
    updater{
      type: kAdaGrad
      learning_rate{
      base_lr: 0.01
      type: kFixed
      }
    }


According to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf),
we configure a EuclideanLoss layer to compute the reconstruction error. The neural net
configuration is (with some of the middle layers omitted),

    layer{ name: "data" }
    layer{ name:"mnist" }
    layer{
      name: "Inner1"
      param{ name: "w1" }
      param{ name: "b12" }
    }
    layer{ name: "Sigmoid1" }
    ...
    layer{
      name: "Inner8"
      innerproduct_conf{
        num_output: 784
        transpose: true
      }
      param{
        name: "w8"
        share_from: "w1"
      }
      param{ name: "b11" }
    }
    layer{ name: "Sigmoid8" }

    # Euclidean Loss Layer Configuration
    layer{
      name: "loss"
      type:kEuclideanLoss
      srclayers:"Sigmoid8"
      srclayers:"mnist"
    }

To load pre-trained parameters from the 4 RBMs' checkpoint file we configure `checkpoint_path` as

    ### Checkpoint Configuration
    checkpoint_path: "examples/rbm/checkpoint/rbm1/checkpoint/step6000-worker0"
    checkpoint_path: "examples/rbm/checkpoint/rbm2/checkpoint/step6000-worker0"
    checkpoint_path: "examples/rbm/checkpoint/rbm3/checkpoint/step6000-worker0"
    checkpoint_path: "examples/rbm/checkpoint/rbm4/checkpoint/step6000-worker0"


## Visualization Results

<div>
<img src="../_static/images/rbm-weight.PNG" align="center" width="300px"/>

<img src="../_static/images/rbm-feature.PNG" align="center" width="300px"/>
<br/>
<span><strong>Figure 6 - Bottom RBM weight matrix.</strong></span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;

<span><strong>Figure 7 - Top layer features.</strong></span>
</div>

Figure 6 visualizes sample columns of the weight matrix of RBM1, We can see the
Gabor-like filters are learned. Figure 7 depicts the features extracted from
the top-layer of the auto-encoder, wherein one point represents one image.
Different colors represent different digits. We can see that most images are
well clustered according to the ground truth.