Python Binding¶
Python binding provides APIs for configuring a training job following keras, including the configuration of neural net, training algorithm, etc. It replaces the configuration file (e.g., job.conf) in protobuf format, which is typically long and error-prone to prepare. We will add python functions to interact with the layer and neural net objects (see here), which would enable users to train and debug their models interactively.
Here is the layout of python related code,
SINGAROOT/tool/python
|-- pb2 (has job_pb2.py)
|-- singa
|-- model.py
|-- layer.py
|-- parameter.py
|-- initialization.py
|-- utils
|-- utility.py
|-- message.py
|-- examples
|-- cifar10_cnn.py, mnist_mlp.py, , mnist_rbm1.py, mnist_ae.py, etc.
|-- datasets
|-- cifar10.py
|-- mnist.py
Compiling and running instructions¶
In order to use the Python APIs, users need to add the following arguments when compiling SINGA,
./configure --enable-python --with-python=PYTHON_DIR
make
where PYTHON_DIR has Python.h
The training program is launched by
bin/singa-run.sh -exec <user_main.py>
where user_main.py creates the JobProto object and passes it to Driver::Train to start the training.
For example,
cd SINGAROOT
bin/singa-run.sh -exec tool/python/examples/cifar10_cnn.py
Examples¶
MLP Example¶
This example uses python APIs to configure and train a MLP model over the MNIST dataset. The configuration content is the same as that written in SINGAROOT/examples/mnist/job.conf.
X_train, X_test, workspace = mnist.load_data()
m = Sequential('mlp', sys.argv)
m.add(Dense(2500, init='uniform', activation='stanh'))
m.add(Dense(2000, init='uniform', activation='stanh'))
m.add(Dense(1500, init='uniform', activation='stanh'))
m.add(Dense(1000, init='uniform', activation='stanh'))
m.add(Dense(500, init='uniform', activation='stanh'))
m.add(Dense(10, init='uniform', activation='softmax'))
sgd = SGD(lr=0.001, lr_type='step')
topo = Cluster(workspace)
m.compile(loss='categorical_crossentropy', optimizer=sgd, cluster=topo)
m.fit(X_train, nb_epoch=1000, with_test=True)
result = m.evaluate(X_test, batch_size=100, test_steps=10, test_freq=60)
CNN Example¶
This example uses python APIs to configure and train a CNN model over the Cifar10 dataset. The configuration content is the same as that written in SINGAROOT/examples/cifar10/job.conf.
X_train, X_test, workspace = cifar10.load_data()
m = Sequential('cnn', sys.argv)
m.add(Convolution2D(32, 5, 1, 2, w_std=0.0001, b_lr=2))
m.add(MaxPooling2D(pool_size=(3,3), stride=2))
m.add(Activation('relu'))
m.add(LRN2D(3, alpha=0.00005, beta=0.75))
m.add(Convolution2D(32, 5, 1, 2, b_lr=2))
m.add(Activation('relu'))
m.add(AvgPooling2D(pool_size=(3,3), stride=2))
m.add(LRN2D(3, alpha=0.00005, beta=0.75))
m.add(Convolution2D(64, 5, 1, 2))
m.add(Activation('relu'))
m.add(AvgPooling2D(pool_size=(3,3), stride=2))
m.add(Dense(10, w_wd=250, b_lr=2, b_wd=0, activation='softmax'))
sgd = SGD(decay=0.004, lr_type='manual', step=(0,60000,65000), step_lr=(0.001,0.0001,0.00001))
topo = Cluster(workspace)
m.compile(updater=sgd, cluster=topo)
m.fit(X_train, nb_epoch=1000, with_test=True)
result = m.evaluate(X_test, 1000, test_steps=30, test_freq=300)
RBM Example¶
This example uses python APIs to configure and train a RBM model over the MNIST dataset. The configuration content is the same as that written in SINGAROOT/examples/rbm.conf*.
rbmid = 3
X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid)
m = Energy('rbm'+str(rbmid), sys.argv)
out_dim = [1000, 500, 250]
m.add(RBM(out_dim, w_std=0.1, b_wd=0))
sgd = SGD(lr=0.1, decay=0.0002, momentum=0.8)
topo = Cluster(workspace)
m.compile(optimizer=sgd, cluster=topo)
m.fit(X_train, alg='cd', nb_epoch=6000)
AutoEncoder Example¶
This example uses python APIs to configure and train an autoencoder model over the MNIST dataset. The configuration content is the same as that written in SINGAROOT/examples/autoencoder.conf.
rbmid = 4
X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid+1)
m = Sequential('autoencoder', sys.argv)
hid_dim = [1000, 500, 250, 30]
m.add(Autoencoder(hid_dim, out_dim=784, activation='sigmoid', param_share=True))
agd = AdaGrad(lr=0.01)
topo = Cluster(workspace)
m.compile(loss='mean_squared_error', optimizer=agd, cluster=topo)
m.fit(X_train, alg='bp', nb_epoch=12200)
To run SINGA on GPU¶
Users need to set a list of gpu ids to device
field in fit() or evaluate().
The number of GPUs must be the same to the number of workers configured for
cluster topology.
gpu_id = [0]
m.fit(X_train, nb_epoch=100, with_test=True, device=gpu_id)
TIPS¶
Hidden layers for MLP can be configured as
for n in [2500, 2000, 1500, 1000, 500]:
m.add(Dense(n, init='uniform', activation='tanh'))
m.add(Dense(10, init='uniform', activation='softmax'))
Activation layer can be specified separately
m.add(Dense(2500, init='uniform'))
m.add(Activation('tanh'))
Users can explicitly specify hyper-parameters of weight and bias
par = Parameter(init='uniform', scale=0.05)
m.add(Dense(2500, w_param=par, b_param=par, activation='tanh'))
m.add(Dense(2000, w_param=par, b_param=par, activation='tanh'))
m.add(Dense(1500, w_param=par, b_param=par, activation='tanh'))
m.add(Dense(1000, w_param=par, b_param=par, activation='tanh'))
m.add(Dense(500, w_param=par, b_param=par, activation='tanh'))
m.add(Dense(10, w_param=par, b_param=par, activation='softmax'))
parw = Parameter(init='gauss', std=0.0001)
parb = Parameter(init='const', value=0)
m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb, b_lr=2))
m.add(MaxPooling2D(pool_size(3,3), stride=2))
m.add(Activation('relu'))
m.add(LRN2D(3, alpha=0.00005, beta=0.75))
parw.update(std=0.01)
m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb))
m.add(Activation('relu'))
m.add(AvgPooling2D(pool_size(3,3), stride=2))
m.add(LRN2D(3, alpha=0.00005, beta=0.75))
m.add(Convolution(64, 5, 1, 2, w_param=parw, b_param=parb, b_lr=1))
m.add(Activation('relu'))
m.add(AvgPooling2D(pool_size(3,3), stride=2))
m.add(Dense(10, w_param=parw, w_wd=250, b_param=parb, b_lr=2, b_wd=0, activation='softmax'))
Data can be added in this way,
X_train, X_test = mnist.load_data() // parameter values are set in load_data()
m.fit(X_train, ...) // Data layer for training is added
m.evaluate(X_test, ...) // Data layer for testing is added
or this way,
X_train, X_test = mnist.load_data() // parameter values are set in load_data()
m.add(X_train) // explicitly add Data layer
m.add(X_test) // explicitly add Data layer
store = Store(path='train.bin', batch_size=64, ...) // parameter values are set explicitly
m.add(Data(load='recordinput', phase='train', conf=store)) // Data layer is added
store = Store(path='test.bin', batch_size=100, ...) // parameter values are set explicitly
m.add(Data(load='recordinput', phase='test', conf=store)) // Data layer is added
Cases to run SINGA¶
(1) Run SINGA for training
m.fit(X_train, nb_epoch=1000)
(2) Run SINGA for training and validation
m.fit(X_train, validate_data=X_valid, nb_epoch=1000)
(3) Run SINGA for test while training
m.fit(X_train, nb_epoch=1000, with_test=True)
result = m.evaluate(X_test, batch_size=100, test_steps=100)
(4) Run SINGA for test only Assume a checkpoint exists after training
result = m.evaluate(X_test, batch_size=100, checkpoint_path=workspace+'/checkpoint/step100-worker0')
Implementation Details¶
Layer class (inherited)¶
- Data
- Dense
- Activation
- Convolution2D
- MaxPooling2D
- AvgPooling2D
- LRN2D
- Dropout
- RBM
- Autoencoder
Model class¶
Model class has jobconf
(JobProto) and layers
(layer list)
Methods in Model class
- add
- add Layer into Model
- 2 subclasses: Sequential model and Energy model
- compile
- set Updater (i.e., optimizer) and Cluster (i.e., topology) components
- fit
- set Training data and parameter values for the training
- (optional) set Validatiaon data and parameter values
- set Train_one_batch component
- specify
with_test
field if a user wants to run SINGA with test data simultaneously. - [TODO] recieve train/validation results, e.g., accuracy, loss, ppl, etc.
- set Training data and parameter values for the training
- evaluate
- set Testing data and parameter values for the testing
- specify
checkpoint_path
field if a user want to run SINGA only for testing. - [TODO] recieve test results, e.g., accuracy, loss, ppl, etc.
Results¶
fit() and evaluate() return train/test results, a dictionary containing
- [key]: step number
- [value]: a list of dictionay
- ‘acc’ for accuracy
- ‘loss’ for loss
- ‘ppl’ for ppl
- ‘se’ for squred error
Parameter class¶
Users need to set parameter and initial values. For example,
- Parameter (fields in Param proto)
- lr = (float) // learning rate multiplier, used to scale the learning rate when updating parameters.
- wd = (float) // weight decay multiplier, used to scale the weight decay when updating parameters.
- Parameter initialization (fields in ParamGen proto)
- init = (string) // one of the types, ‘uniform’, ‘constant’, ‘gaussian’
- high = (float) // for ‘uniform’
- low = (float) // for ‘uniform’
- value = (float) // for ‘constant’
- mean = (float) // for ‘gaussian’
- std = (float) // for ‘gaussian’
- Weight (
w_param
) is ‘gaussian’ with mean=0, std=0.01 at default - Bias (
b_param
) is ‘constant’ with value=0 at default - How to update the parameter fields
- for updating Weight, put
w_
in front of field name - for updating Bias, put
b_
in front of field name
- for updating Weight, put
Several ways to set Parameter values
parw = Parameter(lr=2, wd=10, init='gaussian', std=0.1)
parb = Parameter(lr=1, wd=0, init='constant', value=0)
m.add(Convolution2D(10, w_param=parw, b_param=parb, ...)
m.add(Dense(10, w_mean=1, w_std=0.1, w_lr=2, w_wd=10, ...)
parw = Parameter(init='constant', mean=0)
m.add(Dense(10, w_param=parw, w_lr=1, w_wd=1, b_value=1, ...)
Other classes¶
- Store
- Algorithm
- Updater
- SGD
- AdaGrad
- Cluster