Python binding provides APIs for configuring a training job following keras, including the configuration of neural net, training algorithm, etc. It replaces the configuration file (e.g., job.conf) in protobuf format, which is typically long and error-prone to prepare. In later version, we will add python functions to interact with the layer and neural net objects, which would enable users to train and debug their models interactively.
Here is the layout of python related code,
SINGAROOT/tool/python |-- pb2 (has job_pb2.py) |-- singa |-- model.py |-- layer.py |-- parameter.py |-- initialization.py |-- utils |-- utility.py |-- message.py |-- examples |-- cifar10_cnn.py, mnist_mlp.py, , mnist_rbm1.py, mnist_ae.py, etc. |-- datasets |-- cifar10.py |-- mnist.py
In order to use the Python APIs, users need to add the following arguments when compiling SINGA,
./configure --enable-python --with-python=PYTHON_DIR make
where PYTHON_DIR has Python.h
The training program is launched by
bin/singa-run.sh -exec <user_main.py>
where user_main.py creates the JobProto object and passes it to Driver::Train to start the training.
For example,
cd SINGAROOT bin/singa-run.sh -exec tool/python/examples/cifar10_cnn.py
This example uses python APIs to configure and train a MLP model over the MNIST dataset. The configuration content is the same as that written in SINGAROOT/examples/mnist/job.conf.
X_train, X_test, workspace = mnist.load_data() m = Sequential('mlp', sys.argv) m.add(Dense(2500, init='uniform', activation='tanh')) m.add(Dense(2000, init='uniform', activation='tanh')) m.add(Dense(1500, init='uniform', activation='tanh')) m.add(Dense(1000, init='uniform', activation='tanh')) m.add(Dense(500, init='uniform', activation='tanh')) m.add(Dense(10, init='uniform', activation='softmax')) sgd = SGD(lr=0.001, lr_type='step') topo = Cluster(workspace) m.compile(loss='categorical_crossentropy', optimizer=sgd, cluster=topo) m.fit(X_train, nb_epoch=1000, with_test=True) result = m.evaluate(X_test, batch_size=100, test_steps=10, test_freq=60)
This example uses python APIs to configure and train a CNN model over the Cifar10 dataset. The configuration content is the same as that written in SINGAROOT/examples/cifar10/job.conf.
X_train, X_test, workspace = cifar10.load_data() m = Sequential('cnn', sys.argv) m.add(Convolution2D(32, 5, 1, 2, w_std=0.0001, b_lr=2)) m.add(MaxPooling2D(pool_size=(3,3), stride=2)) m.add(Activation('relu')) m.add(LRN2D(3, alpha=0.00005, beta=0.75)) m.add(Convolution2D(32, 5, 1, 2, b_lr=2)) m.add(Activation('relu')) m.add(AvgPooling2D(pool_size=(3,3), stride=2)) m.add(LRN2D(3, alpha=0.00005, beta=0.75)) m.add(Convolution2D(64, 5, 1, 2)) m.add(Activation('relu')) m.add(AvgPooling2D(pool_size=(3,3), stride=2)) m.add(Dense(10, w_wd=250, b_lr=2, b_wd=0, activation='softmax')) sgd = SGD(decay=0.004, lr_type='manual', step=(0,60000,65000), step_lr=(0.001,0.0001,0.00001)) topo = Cluster(workspace) m.compile(updater=sgd, cluster=topo) m.fit(X_train, nb_epoch=1000, with_test=True) result = m.evaluate(X_test, 1000, test_steps=30, test_freq=300)
This example uses python APIs to configure and train a RBM model over the MNIST dataset. The configuration content is the same as that written in SINGAROOT/examples/rbm.conf*.
rbmid = 3 X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid) m = Energy('rbm'+str(rbmid), sys.argv) out_dim = [1000, 500, 250] m.add(RBM(out_dim, w_std=0.1, b_wd=0)) sgd = SGD(lr=0.1, decay=0.0002, momentum=0.8) topo = Cluster(workspace) m.compile(optimizer=sgd, cluster=topo) m.fit(X_train, alg='cd', nb_epoch=6000)
This example uses python APIs to configure and train an autoencoder model over the MNIST dataset. The configuration content is the same as that written in SINGAROOT/examples/autoencoder.conf.
rbmid = 4 X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid+1) m = Sequential('autoencoder', sys.argv) hid_dim = [1000, 500, 250, 30] m.add(Autoencoder(hid_dim, out_dim=784, activation='sigmoid', param_share=True)) agd = AdaGrad(lr=0.01) topo = Cluster(workspace) m.compile(loss='mean_squared_error', optimizer=agd, cluster=topo) m.fit(X_train, alg='bp', nb_epoch=12200)
Users need to set a list of gpu ids to device field in fit() or evaluate(). The number of GPUs must be the same to the number of workers configured for cluster topology.
gpu_id = [0] m.fit(X_train, nb_epoch=100, with_test=True, device=gpu_id)
Hidden layers for MLP can be configured as
for n in [2500, 2000, 1500, 1000, 500]: m.add(Dense(n, init='uniform', activation='tanh')) m.add(Dense(10, init='uniform', activation='softmax'))
Activation layer can be specified separately
m.add(Dense(2500, init='uniform')) m.add(Activation('tanh'))
Users can explicitly specify hyper-parameters of weight and bias
par = Parameter(init='uniform', scale=0.05) m.add(Dense(2500, w_param=par, b_param=par, activation='tanh')) m.add(Dense(2000, w_param=par, b_param=par, activation='tanh')) m.add(Dense(1500, w_param=par, b_param=par, activation='tanh')) m.add(Dense(1000, w_param=par, b_param=par, activation='tanh')) m.add(Dense(500, w_param=par, b_param=par, activation='tanh')) m.add(Dense(10, w_param=par, b_param=par, activation='softmax'))
parw = Parameter(init='gauss', std=0.0001) parb = Parameter(init='const', value=0) m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb, b_lr=2)) m.add(MaxPooling2D(pool_size(3,3), stride=2)) m.add(Activation('relu')) m.add(LRN2D(3, alpha=0.00005, beta=0.75)) parw.update(std=0.01) m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb)) m.add(Activation('relu')) m.add(AvgPooling2D(pool_size(3,3), stride=2)) m.add(LRN2D(3, alpha=0.00005, beta=0.75)) m.add(Convolution(64, 5, 1, 2, w_param=parw, b_param=parb, b_lr=1)) m.add(Activation('relu')) m.add(AvgPooling2D(pool_size(3,3), stride=2)) m.add(Dense(10, w_param=parw, w_wd=250, b_param=parb, b_lr=2, b_wd=0, activation='softmax'))
Data can be added in this way,
X_train, X_test = mnist.load_data() // parameter values are set in load_data() m.fit(X_train, ...) // Data layer for training is added m.evaluate(X_test, ...) // Data layer for testing is added
or this way,
X_train, X_test = mnist.load_data() // parameter values are set in load_data() m.add(X_train) // explicitly add Data layer m.add(X_test) // explicitly add Data layer
store = Store(path='train.bin', batch_size=64, ...) // parameter values are set explicitly m.add(Data(load='recordinput', phase='train', conf=store)) // Data layer is added store = Store(path='test.bin', batch_size=100, ...) // parameter values are set explicitly m.add(Data(load='recordinput', phase='test', conf=store)) // Data layer is added
(1) Run SINGA for training
m.fit(X_train, nb_epoch=1000)
(2) Run SINGA for training and validation
m.fit(X_train, validate_data=X_valid, nb_epoch=1000)
(3) Run SINGA for test while training
m.fit(X_train, nb_epoch=1000, with_test=True) result = m.evaluate(X_test, batch_size=100, test_steps=100)
(4) Run SINGA for test only Assume a checkpoint exists after training
result = m.evaluate(X_test, batch_size=100, checkpoint_path=workspace+'/checkpoint/step100-worker0')
Model class has jobconf (JobProto) and layers (layer list)
Methods in Model class
add
compile
fit
evaluate
fit() and evaluate() return train/test results, a dictionary containing
Users need to set parameter and initial values. For example,
Parameter (fields in Param proto)
Parameter initialization (fields in ParamGen proto)
Weight (w_param) is ‘gaussian’ with mean=0, std=0.01 at default
Bias (b_param) is ‘constant’ with value=0 at default
How to update the parameter fields
Several ways to set Parameter values
parw = Parameter(lr=2, wd=10, init='gaussian', std=0.1) parb = Parameter(lr=1, wd=0, init='constant', value=0) m.add(Convolution2D(10, w_param=parw, b_param=parb, ...)
m.add(Dense(10, w_mean=1, w_std=0.1, w_lr=2, w_wd=10, ...)
parw = Parameter(init='constant', mean=0) m.add(Dense(10, w_param=parw, w_lr=1, w_wd=1, b_value=1, ...)