# Python Binding

---

Python binding provides APIs for configuring a training job following
[keras](http://keras.io/), including the configuration of neural net, training
algorithm, etc.  It replaces the configuration file (e.g., *job.conf*) in
protobuf format, which is typically long and error-prone to prepare. We will add
python functions to interact with the layer and neural net
objects (see [here](python_interactive_training.html)), which would enable users to train and debug their models
interactively.

Here is the layout of python related code,

    SINGAROOT/tool/python
    |-- pb2 (has job_pb2.py)
    |-- singa
        |-- model.py
        |-- layer.py
        |-- parameter.py
        |-- initialization.py
        |-- utils
            |-- utility.py
            |-- message.py
    |-- examples
        |-- cifar10_cnn.py, mnist_mlp.py, , mnist_rbm1.py, mnist_ae.py, etc.
        |-- datasets
            |-- cifar10.py
            |-- mnist.py

## Compiling and running instructions

In order to use the Python APIs, users need to add the following arguments when compiling
SINGA,

    ./configure --enable-python --with-python=PYTHON_DIR
    make

where PYTHON_DIR has Python.h


The training program is launched by

    bin/singa-run.sh -exec <user_main.py>

where user_main.py creates the JobProto object and passes it to Driver::Train to
start the training.

For example,

    cd SINGAROOT
    bin/singa-run.sh -exec tool/python/examples/cifar10_cnn.py


## Examples


### MLP Example

This example uses python APIs to configure and train a MLP model over the MNIST
dataset. The configuration content is the same as that written in *SINGAROOT/examples/mnist/job.conf*.

```
X_train, X_test, workspace = mnist.load_data()

m = Sequential('mlp', sys.argv)

m.add(Dense(2500, init='uniform', activation='stanh'))
m.add(Dense(2000, init='uniform', activation='stanh'))
m.add(Dense(1500, init='uniform', activation='stanh'))
m.add(Dense(1000, init='uniform', activation='stanh'))
m.add(Dense(500,  init='uniform', activation='stanh'))
m.add(Dense(10, init='uniform', activation='softmax'))

sgd = SGD(lr=0.001, lr_type='step')
topo = Cluster(workspace)
m.compile(loss='categorical_crossentropy', optimizer=sgd, cluster=topo)
m.fit(X_train, nb_epoch=1000, with_test=True)
result = m.evaluate(X_test, batch_size=100, test_steps=10, test_freq=60)
```

### CNN Example

This example uses python APIs to configure and train a CNN model over the Cifar10
dataset. The configuration content is the same as that written in *SINGAROOT/examples/cifar10/job.conf*.


```
X_train, X_test, workspace = cifar10.load_data()

m = Sequential('cnn', sys.argv)

m.add(Convolution2D(32, 5, 1, 2, w_std=0.0001, b_lr=2))
m.add(MaxPooling2D(pool_size=(3,3), stride=2))
m.add(Activation('relu'))
m.add(LRN2D(3, alpha=0.00005, beta=0.75))

m.add(Convolution2D(32, 5, 1, 2, b_lr=2))
m.add(Activation('relu'))
m.add(AvgPooling2D(pool_size=(3,3), stride=2))
m.add(LRN2D(3, alpha=0.00005, beta=0.75))

m.add(Convolution2D(64, 5, 1, 2))
m.add(Activation('relu'))
m.add(AvgPooling2D(pool_size=(3,3), stride=2))

m.add(Dense(10, w_wd=250, b_lr=2, b_wd=0, activation='softmax'))

sgd = SGD(decay=0.004, lr_type='manual', step=(0,60000,65000), step_lr=(0.001,0.0001,0.00001))
topo = Cluster(workspace)
m.compile(updater=sgd, cluster=topo)
m.fit(X_train, nb_epoch=1000, with_test=True)
result = m.evaluate(X_test, 1000, test_steps=30, test_freq=300)
```


### RBM Example

This example uses python APIs to configure and train a RBM model over the MNIST
dataset. The configuration content is the same as that written in *SINGAROOT/examples/rbm*.conf*.

```
rbmid = 3
X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid)
m = Energy('rbm'+str(rbmid), sys.argv)

out_dim = [1000, 500, 250]
m.add(RBM(out_dim, w_std=0.1, b_wd=0))

sgd = SGD(lr=0.1, decay=0.0002, momentum=0.8)
topo = Cluster(workspace)
m.compile(optimizer=sgd, cluster=topo)
m.fit(X_train, alg='cd', nb_epoch=6000)
```

### AutoEncoder Example
This example uses python APIs to configure and train an autoencoder model over
the MNIST dataset. The configuration content is the same as that written in
*SINGAROOT/examples/autoencoder.conf*.


```
rbmid = 4
X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid+1)
m = Sequential('autoencoder', sys.argv)

hid_dim = [1000, 500, 250, 30]
m.add(Autoencoder(hid_dim, out_dim=784, activation='sigmoid', param_share=True))

agd = AdaGrad(lr=0.01)
topo = Cluster(workspace)
m.compile(loss='mean_squared_error', optimizer=agd, cluster=topo)
m.fit(X_train, alg='bp', nb_epoch=12200)
```

### To run SINGA on GPU

Users need to set a list of gpu ids to `device` field in fit() or evaluate().
The number of GPUs must be the same to the number of workers configured for
cluster topology.


```
gpu_id = [0]
m.fit(X_train, nb_epoch=100, with_test=True, device=gpu_id)
```

### TIPS

Hidden layers for MLP can be configured as

```
for n in [2500, 2000, 1500, 1000, 500]:
  m.add(Dense(n, init='uniform', activation='tanh'))
m.add(Dense(10, init='uniform', activation='softmax'))
```

Activation layer can be specified separately

```
m.add(Dense(2500, init='uniform'))
m.add(Activation('tanh'))
```

Users can explicitly specify hyper-parameters of weight and bias

```
par = Parameter(init='uniform', scale=0.05)
m.add(Dense(2500, w_param=par, b_param=par, activation='tanh'))
m.add(Dense(2000, w_param=par, b_param=par, activation='tanh'))
m.add(Dense(1500, w_param=par, b_param=par, activation='tanh'))
m.add(Dense(1000, w_param=par, b_param=par, activation='tanh'))
m.add(Dense(500, w_param=par, b_param=par, activation='tanh'))
m.add(Dense(10, w_param=par, b_param=par, activation='softmax'))
```


```
parw = Parameter(init='gauss', std=0.0001)
parb = Parameter(init='const', value=0)
m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb, b_lr=2))
m.add(MaxPooling2D(pool_size(3,3), stride=2))
m.add(Activation('relu'))
m.add(LRN2D(3, alpha=0.00005, beta=0.75))

parw.update(std=0.01)
m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb))
m.add(Activation('relu'))
m.add(AvgPooling2D(pool_size(3,3), stride=2))
m.add(LRN2D(3, alpha=0.00005, beta=0.75))

m.add(Convolution(64, 5, 1, 2, w_param=parw, b_param=parb, b_lr=1))
m.add(Activation('relu'))
m.add(AvgPooling2D(pool_size(3,3), stride=2))

m.add(Dense(10, w_param=parw, w_wd=250, b_param=parb, b_lr=2, b_wd=0, activation='softmax'))
```


Data can be added in this way,

```
X_train, X_test = mnist.load_data()  // parameter values are set in load_data()
m.fit(X_train, ...)                  // Data layer for training is added
m.evaluate(X_test, ...)              // Data layer for testing is added
```
or this way,

```
X_train, X_test = mnist.load_data()  // parameter values are set in load_data()
m.add(X_train)                       // explicitly add Data layer
m.add(X_test)                        // explicitly add Data layer
```


```
store = Store(path='train.bin', batch_size=64, ...)        // parameter values are set explicitly
m.add(Data(load='recordinput', phase='train', conf=store)) // Data layer is added
store = Store(path='test.bin', batch_size=100, ...)        // parameter values are set explicitly
m.add(Data(load='recordinput', phase='test', conf=store))  // Data layer is added
```


### Cases to run SINGA

(1) Run SINGA for training

```
m.fit(X_train, nb_epoch=1000)
```

(2) Run SINGA for training and validation

```
m.fit(X_train, validate_data=X_valid, nb_epoch=1000)
```

(3) Run SINGA for test while training

```
m.fit(X_train, nb_epoch=1000, with_test=True)
result = m.evaluate(X_test, batch_size=100, test_steps=100)
```

(4) Run SINGA for test only
Assume a checkpoint exists after training

```
result = m.evaluate(X_test, batch_size=100, checkpoint_path=workspace+'/checkpoint/step100-worker0')
```


## Implementation Details

### Layer class (inherited)

* Data
* Dense
* Activation
* Convolution2D
* MaxPooling2D
* AvgPooling2D
* LRN2D
* Dropout
* RBM
* Autoencoder

### Model class

Model class has `jobconf` (JobProto) and `layers` (layer list)

Methods in Model class

* add
	* add Layer into Model
	* 2 subclasses: Sequential model and Energy model

* compile
	* set Updater (i.e., optimizer) and Cluster (i.e., topology) components

* fit
	* set Training data and parameter values for the training
		* (optional) set Validatiaon data and parameter values
	* set Train_one_batch component
	* specify `with_test` field if a user wants to run SINGA with test data simultaneously.
	* [TODO] recieve train/validation results, e.g., accuracy, loss, ppl, etc.

* evaluate
	* set Testing data and parameter values for the testing
	* specify `checkpoint_path` field if a user want to run SINGA only for testing.
	* [TODO] recieve test results, e.g., accuracy, loss, ppl, etc.

### Results

fit() and evaluate() return train/test results, a dictionary containing

* [key]: step number
* [value]: a list of dictionay
	* 'acc' for accuracy
	* 'loss' for loss
	* 'ppl' for ppl
	* 'se' for squred error


### Parameter class

Users need to set parameter and initial values. For example,

* Parameter (fields in Param proto)
	* lr = (float) // learning rate multiplier, used to scale the learning rate when updating parameters.
	* wd = (float) // weight decay multiplier, used to scale the weight decay when updating parameters.

* Parameter initialization (fields in ParamGen proto)
	* init = (string) // one of the types, 'uniform', 'constant', 'gaussian'
	* high = (float)  // for 'uniform'
	* low = (float)   // for 'uniform'
	* value = (float) // for 'constant'
	* mean = (float)  // for 'gaussian'
	* std = (float)   // for 'gaussian'

* Weight (`w_param`) is 'gaussian' with mean=0, std=0.01 at default

* Bias (`b_param`) is 'constant' with value=0 at default

* How to update the parameter fields
	* for updating Weight, put `w_` in front of field name
	* for updating Bias, put `b_` in front of field name

Several ways to set Parameter values

```
parw = Parameter(lr=2, wd=10, init='gaussian', std=0.1)
parb = Parameter(lr=1, wd=0, init='constant', value=0)
m.add(Convolution2D(10, w_param=parw, b_param=parb, ...)
```

```
m.add(Dense(10, w_mean=1, w_std=0.1, w_lr=2, w_wd=10, ...)
```

```
parw = Parameter(init='constant', mean=0)
m.add(Dense(10, w_param=parw, w_lr=1, w_wd=1, b_value=1, ...)
```

### Other classes

* Store
* Algorithm
* Updater
* SGD
* AdaGrad
* Cluster