# MLP Example

---

Multilayer perceptron (MLP) is a subclass of feed-forward neural networks.
A MLP typically consists of multiple directly connected layers, with each layer fully
connected to the next one. In this example, we will use SINGA to train a
[simple MLP model proposed by Ciresan](http://arxiv.org/abs/1003.0358)
for classifying handwritten digits from the [MNIST dataset](http://yann.lecun.com/exdb/mnist/).

## Running instructions

Please refer to the [installation](installation.html) page for
instructions on building SINGA, and the [quick start](quick-start.html)
for instructions on starting zookeeper.

We have provided scripts for preparing the training and test dataset in *examples/cifar10/*.

    # in examples/mnist
    $ cp Makefile.example Makefile
    $ make download
    $ make create

### Training on CPU

After the datasets are prepared, we start the training by

    ./bin/singa-run.sh -conf examples/mnist/job.conf

After it is started, you should see output like

    Record job information to /tmp/singa-log/job-info/job-1-20150817-055231
    Executing : ./singa -conf /xxx/incubator-singa/examples/mnist/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 1
    E0817 07:15:09.211885 34073 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 34073)
    E0817 07:15:14.972231 34114 server.cc:36] Server (group = 0, id = 0) start
    E0817 07:15:14.972520 34115 worker.cc:134] Worker (group = 0, id = 0) start
    E0817 07:15:24.462602 34073 trainer.cc:373] Test step-0, loss : 2.341021, accuracy : 0.109100
    E0817 07:15:47.341076 34073 trainer.cc:373] Train step-0, loss : 2.357269, accuracy : 0.099000
    E0817 07:16:07.173364 34073 trainer.cc:373] Train step-10, loss : 2.222740, accuracy : 0.201800
    E0817 07:16:26.714855 34073 trainer.cc:373] Train step-20, loss : 2.091030, accuracy : 0.327200
    E0817 07:16:46.590946 34073 trainer.cc:373] Train step-30, loss : 1.969412, accuracy : 0.442100
    E0817 07:17:06.207080 34073 trainer.cc:373] Train step-40, loss : 1.865466, accuracy : 0.514800
    E0817 07:17:25.890033 34073 trainer.cc:373] Train step-50, loss : 1.773849, accuracy : 0.569100
    E0817 07:17:51.208935 34073 trainer.cc:373] Test step-60, loss : 1.613709, accuracy : 0.662100
    E0817 07:17:53.176766 34073 trainer.cc:373] Train step-60, loss : 1.659150, accuracy : 0.652600
    E0817 07:18:12.783370 34073 trainer.cc:373] Train step-70, loss : 1.574024, accuracy : 0.666000
    E0817 07:18:32.904942 34073 trainer.cc:373] Train step-80, loss : 1.529380, accuracy : 0.670500
    E0817 07:18:52.608111 34073 trainer.cc:373] Train step-90, loss : 1.443911, accuracy : 0.703500
    E0817 07:19:12.168465 34073 trainer.cc:373] Train step-100, loss : 1.387759, accuracy : 0.721000
    E0817 07:19:31.855865 34073 trainer.cc:373] Train step-110, loss : 1.335246, accuracy : 0.736500
    E0817 07:19:57.327133 34073 trainer.cc:373] Test step-120, loss : 1.216652, accuracy : 0.769900

After the training of some steps (depends on the setting) or the job is
finished, SINGA will [checkpoint](checkpoint.html) the model parameters.

### Training on GPU

To train this example model on GPU, just add a field in the configuration file for
the GPU device,

    # job.conf
    gpu: 0

### Training using Python script

The python helpers come with SINGA 0.2 make it easy to configure the job. For example
the job.conf is replaced with a simple python script mnist_mlp.py
which has about 30 lines of code following the [Keras API](http://keras.io/).

    ./bin/singa-run.sh -exec tool/python/examples/mnist_mlp.py


## Details

To train a model in SINGA, you need to prepare the datasets,
and a job configuration which specifies the neural net structure, training
algorithm (BP or CD), SGD update algorithm (e.g. Adagrad),
number of training/test steps, etc.

### Data preparation

Before using SINGA, you need to write a program to pre-process the dataset you
use to a format that SINGA can read. Please refer to the
[Data Preparation](data.html) to get details about preparing
this MNIST dataset.


### Neural net

<div style = "text-align: center">
<img src = "../_static/images/example-mlp.png" style = "width: 230px">
<br/><strong>Figure 1 - Net structure of the MLP example. </strong></img>
</div>


Figure 1 shows the structure of the simple MLP model, which is constructed following
[Ciresan's paper](http://arxiv.org/abs/1003.0358). The dashed circle contains
two layers which represent one feature transformation stage. There are 6 such
stages in total. They sizes of the [InnerProductLayer](layer.html#innerproductlayer)s in these circles decrease from
2500->2000->1500->1000->500->10.

Next we follow the guide in [neural net page](neural-net.html)
and [layer page](layer.html) to write the neural net configuration.

* We configure an input layer to read the training/testing records from a disk file.

        layer {
            name: "data"
            type: kRecordInput
            store_conf {
              backend: "kvfile"
              path: "examples/mnist/train_data.bin"
              random_skip: 5000
              batchsize: 64
              shape: 784
              std_value: 127.5
              mean_value: 127.5
             }
             exclude: kTest
          }

        layer {
            name: "data"
            type: kRecordInput
            store_conf {
              backend: "kvfile"
              path: "examples/mnist/test_data.bin"
              batchsize: 100
              shape: 784
              std_value: 127.5
              mean_value: 127.5
             }
             exclude: kTrain
          }


* All [InnerProductLayer](layer.html#innerproductlayer)s are configured similarly as,

        layer{
          name: "fc1"
          type: kInnerProduct
          srclayers:"data"
          innerproduct_conf{
            num_output: 2500
          }
          param{
            name: "w1"
            ...
          }
          param{
            name: "b1"
            ..
          }
        }

    with the `num_output` decreasing from 2500 to 10.

* A [STanhLayer](layer.html#stanhlayer) is connected to every InnerProductLayer
except the last one. It transforms the feature via scaled tanh function.

        layer{
          name: "tanh1"
          type: kSTanh
          srclayers:"fc1"
        }

* The final [Softmax loss layer](layer.html#softmaxloss) connects
to LabelLayer and the last STanhLayer.

        layer{
          name: "loss"
          type:kSoftmaxLoss
          softmaxloss_conf{ topk:1 }
          srclayers:"fc6"
          srclayers:"data"
        }

### Updater

The [normal SGD updater](updater.html#updater) is selected.
The learning rate shrinks by 0.997 every 60 steps (i.e., one epoch).

    updater{
      type: kSGD
      learning_rate{
        base_lr: 0.001
        type : kStep
        step_conf{
          change_freq: 60
          gamma: 0.997
        }
      }
    }

### TrainOneBatch algorithm

The MLP model is a feed-forward model, hence
[Back-propagation algorithm](train-one-batch#back-propagation)
is selected.

    train_one_batch {
      alg: kBP
    }

### Cluster setting

The following configuration set a single worker and server for training.
[Training frameworks](frameworks.html) page introduces configurations of a couple of distributed
training frameworks.

    cluster {
      nworker_groups: 1
      nserver_groups: 1
    }