# MLP Example --- Multilayer perceptron (MLP) is a subclass of feed-forward neural networks. A MLP typically consists of multiple directly connected layers, with each layer fully connected to the next one. In this example, we will use SINGA to train a [simple MLP model proposed by Ciresan](http://arxiv.org/abs/1003.0358) for classifying handwritten digits from the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). ## Running instructions Please refer to the [installation](installation.html) page for instructions on building SINGA, and the [quick start](quick-start.html) for instructions on starting zookeeper. We have provided scripts for preparing the training and test dataset in *examples/cifar10/*. # in examples/mnist $ cp Makefile.example Makefile $ make download $ make create ### Training on CPU After the datasets are prepared, we start the training by ./bin/singa-run.sh -conf examples/mnist/job.conf After it is started, you should see output like Record job information to /tmp/singa-log/job-info/job-1-20150817-055231 Executing : ./singa -conf /xxx/incubator-singa/examples/mnist/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 1 E0817 07:15:09.211885 34073 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 34073) E0817 07:15:14.972231 34114 server.cc:36] Server (group = 0, id = 0) start E0817 07:15:14.972520 34115 worker.cc:134] Worker (group = 0, id = 0) start E0817 07:15:24.462602 34073 trainer.cc:373] Test step-0, loss : 2.341021, accuracy : 0.109100 E0817 07:15:47.341076 34073 trainer.cc:373] Train step-0, loss : 2.357269, accuracy : 0.099000 E0817 07:16:07.173364 34073 trainer.cc:373] Train step-10, loss : 2.222740, accuracy : 0.201800 E0817 07:16:26.714855 34073 trainer.cc:373] Train step-20, loss : 2.091030, accuracy : 0.327200 E0817 07:16:46.590946 34073 trainer.cc:373] Train step-30, loss : 1.969412, accuracy : 0.442100 E0817 07:17:06.207080 34073 trainer.cc:373] Train step-40, loss : 1.865466, accuracy : 0.514800 E0817 07:17:25.890033 34073 trainer.cc:373] Train step-50, loss : 1.773849, accuracy : 0.569100 E0817 07:17:51.208935 34073 trainer.cc:373] Test step-60, loss : 1.613709, accuracy : 0.662100 E0817 07:17:53.176766 34073 trainer.cc:373] Train step-60, loss : 1.659150, accuracy : 0.652600 E0817 07:18:12.783370 34073 trainer.cc:373] Train step-70, loss : 1.574024, accuracy : 0.666000 E0817 07:18:32.904942 34073 trainer.cc:373] Train step-80, loss : 1.529380, accuracy : 0.670500 E0817 07:18:52.608111 34073 trainer.cc:373] Train step-90, loss : 1.443911, accuracy : 0.703500 E0817 07:19:12.168465 34073 trainer.cc:373] Train step-100, loss : 1.387759, accuracy : 0.721000 E0817 07:19:31.855865 34073 trainer.cc:373] Train step-110, loss : 1.335246, accuracy : 0.736500 E0817 07:19:57.327133 34073 trainer.cc:373] Test step-120, loss : 1.216652, accuracy : 0.769900 After the training of some steps (depends on the setting) or the job is finished, SINGA will [checkpoint](checkpoint.html) the model parameters. ### Training on GPU To train this example model on GPU, just add a field in the configuration file for the GPU device, # job.conf gpu: 0 ### Training using Python script The python helpers come with SINGA 0.2 make it easy to configure the job. For example the job.conf is replaced with a simple python script mnist_mlp.py which has about 30 lines of code following the [Keras API](http://keras.io/). ./bin/singa-run.sh -exec tool/python/examples/mnist_mlp.py ## Details To train a model in SINGA, you need to prepare the datasets, and a job configuration which specifies the neural net structure, training algorithm (BP or CD), SGD update algorithm (e.g. Adagrad), number of training/test steps, etc. ### Data preparation Before using SINGA, you need to write a program to pre-process the dataset you use to a format that SINGA can read. Please refer to the [Data Preparation](data.html) to get details about preparing this MNIST dataset. ### Neural net

Figure 1 - Net structure of the MLP example.
Figure 1 shows the structure of the simple MLP model, which is constructed following [Ciresan's paper](http://arxiv.org/abs/1003.0358). The dashed circle contains two layers which represent one feature transformation stage. There are 6 such stages in total. They sizes of the [InnerProductLayer](layer.html#innerproductlayer)s in these circles decrease from 2500->2000->1500->1000->500->10. Next we follow the guide in [neural net page](neural-net.html) and [layer page](layer.html) to write the neural net configuration. * We configure an input layer to read the training/testing records from a disk file. layer { name: "data" type: kRecordInput store_conf { backend: "kvfile" path: "examples/mnist/train_data.bin" random_skip: 5000 batchsize: 64 shape: 784 std_value: 127.5 mean_value: 127.5 } exclude: kTest } layer { name: "data" type: kRecordInput store_conf { backend: "kvfile" path: "examples/mnist/test_data.bin" batchsize: 100 shape: 784 std_value: 127.5 mean_value: 127.5 } exclude: kTrain } * All [InnerProductLayer](layer.html#innerproductlayer)s are configured similarly as, layer{ name: "fc1" type: kInnerProduct srclayers:"data" innerproduct_conf{ num_output: 2500 } param{ name: "w1" ... } param{ name: "b1" .. } } with the `num_output` decreasing from 2500 to 10. * A [STanhLayer](layer.html#stanhlayer) is connected to every InnerProductLayer except the last one. It transforms the feature via scaled tanh function. layer{ name: "tanh1" type: kSTanh srclayers:"fc1" } * The final [Softmax loss layer](layer.html#softmaxloss) connects to LabelLayer and the last STanhLayer. layer{ name: "loss" type:kSoftmaxLoss softmaxloss_conf{ topk:1 } srclayers:"fc6" srclayers:"data" } ### Updater The [normal SGD updater](updater.html#updater) is selected. The learning rate shrinks by 0.997 every 60 steps (i.e., one epoch). updater{ type: kSGD learning_rate{ base_lr: 0.001 type : kStep step_conf{ change_freq: 60 gamma: 0.997 } } } ### TrainOneBatch algorithm The MLP model is a feed-forward model, hence [Back-propagation algorithm](train-one-batch#back-propagation) is selected. train_one_batch { alg: kBP } ### Cluster setting The following configuration set a single worker and server for training. [Training frameworks](frameworks.html) page introduces configurations of a couple of distributed training frameworks. cluster { nworker_groups: 1 nserver_groups: 1 }