# RBM Example --- This example uses SINGA to train 4 RBM models and one auto-encoder model over the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). The auto-encoder model is trained to reduce the dimensionality of the MNIST image feature. The RBM models are trained to initialize parameters of the auto-encoder model. This example application is from [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf). ## Running instructions Running scripts are provided in *SINGA_ROOT/examples/rbm* folder. The MNIST dataset has 70,000 handwritten digit images. The [data preparation](data.html) page has details on converting this dataset into SINGA recognizable format. Users can simply run the following commands to download and convert the dataset. # at SINGA_ROOT/examples/mnist/ $ cp Makefile.example Makefile $ make download $ make create The training is separated into two phases, namely pre-training and fine-tuning. The pre-training phase trains 4 RBMs in sequence, # at SINGA_ROOT/ $ ./bin/singa-run.sh -conf examples/rbm/rbm1.conf $ ./bin/singa-run.sh -conf examples/rbm/rbm2.conf $ ./bin/singa-run.sh -conf examples/rbm/rbm3.conf $ ./bin/singa-run.sh -conf examples/rbm/rbm4.conf The fine-tuning phase trains the auto-encoder by, $ ./bin/singa-run.sh -conf examples/rbm/autoencoder.conf ## Training details ### RBM1 Figure 1 - RBM1. The neural net structure for training RBM1 is shown in Figure 1. The data layer and parser layer provides features for training RBM1. The visible layer (connected with parser layer) of RBM1 accepts the image feature (784 dimension). The hidden layer is set to have 1000 neurons (units). These two layers are configured as, layer{ name: "RBMVis" type: kRBMVis srclayers:"mnist" srclayers:"RBMHid" rbm_conf{ hdim: 1000 } param{ name: "w1" init{ type: kGaussian mean: 0.0 std: 0.1 } } param{ name: "b11" init{ type: kConstant value: 0.0 } } } layer{ name: "RBMHid" type: kRBMHid srclayers:"RBMVis" rbm_conf{ hdim: 1000 } param{ name: "w1_" share_from: "w1" } param{ name: "b12" init{ type: kConstant value: 0.0 } } } For RBM, the weight matrix is shared by the visible and hidden layers. For instance, `w1` is shared by `vis` and `hid` layers shown in Figure 1. In SINGA, we can configure the `share_from` field to enable [parameter sharing](param.html) as shown above for the param `w1` and `w1_`. [Contrastive Divergence](train-one-batch.html#contrastive-divergence) is configured as the algorithm for [TrainOneBatch](train-one-batch.html). Following Hinton's paper, we configure the [updating protocol](updater.html) as follows, # Updater Configuration updater{ type: kSGD momentum: 0.2 weight_decay: 0.0002 learning_rate{ base_lr: 0.1 type: kFixed } } Since the parameters of RBM0 will be used to initialize the auto-encoder, we should configure the `workspace` field to specify a path for the checkpoint folder. For example, if we configure it as, cluster { workspace: "examples/rbm/rbm1/" } Then SINGA will [checkpoint the parameters](checkpoint.html) into *examples/rbm/rbm1/*. ### RBM1 Figure 2 - RBM2. Figure 2 shows the net structure of training RBM2. The visible units of RBM2 accept the output from the Sigmoid1 layer. The Inner1 layer is a `InnerProductLayer` whose parameters are set to the `w1` and `b12` learned from RBM1. The neural net configuration is (with layers for data layer and parser layer omitted). layer{ name: "Inner1" type: kInnerProduct srclayers:"mnist" innerproduct_conf{ num_output: 1000 } param{ name: "w1" } param{ name: "b12"} } layer{ name: "Sigmoid1" type: kSigmoid srclayers:"Inner1" } layer{ name: "RBMVis" type: kRBMVis srclayers:"Sigmoid1" srclayers:"RBMHid" rbm_conf{ hdim: 500 } param{ name: "w2" ... } param{ name: "b21" ... } } layer{ name: "RBMHid" type: kRBMHid srclayers:"RBMVis" rbm_conf{ hdim: 500 } param{ name: "w2_" share_from: "w2" } param{ name: "b22" ... } } To load w0 and b02 from RBM0's checkpoint file, we configure the `checkpoint_path` as, checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0" cluster{ workspace: "examples/rbm/rbm2" } The workspace is changed for checkpointing `w2`, `b21` and `b22` into *examples/rbm/rbm2/*. ### RBM3 Figure 3 - RBM3. Figure 3 shows the net structure of training RBM3. In this model, a layer with 250 units is added as the hidden layer of RBM3. The visible units of RBM3 accepts output from Sigmoid2 layer. Parameters of Inner1 and Innner2 are set to `w1,b12,w2,b22` which can be load from the checkpoint file of RBM2, i.e., "examples/rbm/rbm2/". ### RBM4 Figure 4 - RBM4. Figure 4 shows the net structure of training RBM4. It is similar to Figure 3, but according to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf), the hidden units of the top RBM (RBM4) have stochastic real-valued states drawn from a unit variance Gaussian whose mean is determined by the input from the RBM's logistic visible units. So we add a `gaussian` field in the RBMHid layer to control the sampling distribution (Gaussian or Bernoulli). In addition, this RBM has a much smaller learning rate (0.001). The neural net configuration for the RBM4 and the updating protocol is (with layers for data layer and parser layer omitted), # Updater Configuration updater{ type: kSGD momentum: 0.9 weight_decay: 0.0002 learning_rate{ base_lr: 0.001 type: kFixed } } layer{ name: "RBMVis" type: kRBMVis srclayers:"Sigmoid3" srclayers:"RBMHid" rbm_conf{ hdim: 30 } param{ name: "w4" ... } param{ name: "b41" ... } } layer{ name: "RBMHid" type: kRBMHid srclayers:"RBMVis" rbm_conf{ hdim: 30 gaussian: true } param{ name: "w4_" share_from: "w4" } param{ name: "b42" ... } } ### Auto-encoder In the fine-tuning stage, the 4 RBMs are "unfolded" to form encoder and decoder networks that are initialized using the parameters from the previous 4 RBMs. Figure 5 - Auto-Encoders. Figure 5 shows the neural net structure for training the auto-encoder. [Back propagation (kBP)] (train-one-batch.html) is configured as the algorithm for `TrainOneBatch`. We use the same cluster configuration as RBM models. For updater, we use [AdaGrad](updater.html#adagradupdater) algorithm with fixed learning rate. ### Updater Configuration updater{ type: kAdaGrad learning_rate{ base_lr: 0.01 type: kFixed } } According to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf), we configure a EuclideanLoss layer to compute the reconstruction error. The neural net configuration is (with some of the middle layers omitted), layer{ name: "data" } layer{ name:"mnist" } layer{ name: "Inner1" param{ name: "w1" } param{ name: "b12" } } layer{ name: "Sigmoid1" } ... layer{ name: "Inner8" innerproduct_conf{ num_output: 784 transpose: true } param{ name: "w8" share_from: "w1" } param{ name: "b11" } } layer{ name: "Sigmoid8" } # Euclidean Loss Layer Configuration layer{ name: "loss" type:kEuclideanLoss srclayers:"Sigmoid8" srclayers:"mnist" } To load pre-trained parameters from the 4 RBMs' checkpoint file we configure `checkpoint_path` as ### Checkpoint Configuration checkpoint_path: "examples/rbm/checkpoint/rbm1/checkpoint/step6000-worker0" checkpoint_path: "examples/rbm/checkpoint/rbm2/checkpoint/step6000-worker0" checkpoint_path: "examples/rbm/checkpoint/rbm3/checkpoint/step6000-worker0" checkpoint_path: "examples/rbm/checkpoint/rbm4/checkpoint/step6000-worker0" ## Visualization Results

Figure 6 - Bottom RBM weight matrix.         Figure 7 - Top layer features.
Figure 6 visualizes sample columns of the weight matrix of RBM1, We can see the Gabor-like filters are learned. Figure 7 depicts the features extracted from the top-layer of the auto-encoder, wherein one point represents one image. Different colors represent different digits. We can see that most images are well clustered according to the ground truth.