Apache SINGA
A distributed deep learning platform .
 All Classes Namespaces Files Functions Variables Typedefs Macros
Public Member Functions | Static Public Member Functions | Protected Attributes | List of all members
singa::Worker Class Referenceabstract

The Worker class which runs the training algorithm. More...

#include <worker.h>

Inheritance diagram for singa::Worker:
singa::BPWorker singa::CDWorker

Public Member Functions

virtual void Init (int thread_id, int grp_id, int id)
 
void Setup (const JobProto &job, shared_ptr< NeuralNet > train_net, shared_ptr< NeuralNet > valid_net, shared_ptr< NeuralNet > test_net)
 Setup members.
 
void Run ()
 Main function of Worker. More...
 
void InitLocalParams ()
 Init all local params (i.e., params from layers resident in this worker). More...
 
void Checkpoint (int step, shared_ptr< NeuralNet > net)
 Checkpoint all params owned by the worker from the first group onto disk. More...
 
void Test (int nsteps, Phase phase, shared_ptr< NeuralNet > net)
 Test the perforance of the learned model on validation or test dataset. More...
 
virtual void TrainOneBatch (int step, Metric *perf)=0
 Train one mini-batch. More...
 
virtual void TestOneBatch (int step, Phase phase, shared_ptr< NeuralNet > net, Metric *perf)=0
 Test/validate one mini-batch.
 
void Report (const string &prefix, const Metric &perf)
 Report performance to the stub. More...
 
int Put (Param *param, int step)
 Put Param to server. More...
 
int Get (Param *param, int step)
 Get Param with specific version from server If the current version >= the requested version, then return. More...
 
int Update (Param *param, int step)
 Update Param. More...
 
int Collect (Param *param, int step)
 Block until the param is updated since sending the update request. More...
 
int CollectAll (shared_ptr< NeuralNet > net, int step)
 Call Collect for every param of net.
 
void ReceiveBlobs (bool data, bool grad, BridgeLayer *layer, shared_ptr< NeuralNet > net)
 Receive blobs from other workers due to model partitions.
 
void SendBlobs (bool data, bool grad, BridgeLayer *layer, shared_ptr< NeuralNet > net)
 Send blobs to other workers due to model partitions.
 
bool DisplayNow (int step) const
 Check is it time to display training info, e.g., loss and precison.
 
bool DisplayDebugInfo (int step) const
 Check is it time to display training info, e.g., loss and precison.
 
bool StopNow (int step) const
 Check is it time to stop.
 
bool CheckpointNow (int step) const
 Check is it time to do checkpoint.
 
bool TestNow (int step) const
 Check is it time to do test. More...
 
bool ValidateNow (int step) const
 Check is it time to do validation. More...
 
int grp_id () const
 
int id () const
 worker ID within the worker group.
 

Static Public Member Functions

static WorkerCreate (const JobProto &proto)
 

Protected Attributes

int thread_id_
 
int grp_id_
 
int id_
 
int step_
 
JobProto job_conf_
 
shared_ptr< NeuralNettrain_net_
 
shared_ptr< NeuralNettest_net_
 
shared_ptr< NeuralNetvalidation_net_
 
Dealerlayer_dealer_
 
Dealerdealer_
 

Detailed Description

The Worker class which runs the training algorithm.

The first worker group will initialize parameters of the Net, and put them into the distributed memory/table. The virtual function TrainOneBatch and TestOneBatch implement the training and test algorithm for one mini-batch data.

Child workers override the two functions to implement their training algorithms, e.g., the BPWorker/CDWorker/BPTTWorker implements the BP/CD/BPTT algorithm respectively.

Member Function Documentation

void singa::Worker::Checkpoint ( int  step,
shared_ptr< NeuralNet net 
)

Checkpoint all params owned by the worker from the first group onto disk.

The serialization is done using BlobProtos which includes the name, version and values of each Param. Different worker would generate different checkpoint files. The file path is <workspace>/checkpoint-<jobname>-step<step>-worker<worker_id>.bin

Parameters
steptraining step of this worker
netthe training net whose params will be dumped.
int singa::Worker::Collect ( Param param,
int  step 
)

Block until the param is updated since sending the update request.

Parameters
param
stepnot used
int singa::Worker::Get ( Param param,
int  step 
)

Get Param with specific version from server If the current version >= the requested version, then return.

Otherwise send a get request to stub who would forwards it to servers.

Parameters
param
steprequested param version
int singa::Worker::grp_id ( ) const
inline
Returns
group ID
virtual void singa::Worker::Init ( int  thread_id,
int  grp_id,
int  id 
)
virtual
Parameters
thread_idlocal thread index within the procs
grp_idglobal worker group ID
idworker ID within the group

Reimplemented in singa::BPWorker.

void singa::Worker::InitLocalParams ( )

Init all local params (i.e., params from layers resident in this worker).

If the param is owned by the worker, then init it and put it to servers. Otherwise call Get() to get the param. The Get may not send get request. Because the param's own is in the same procs. Once the owner initializes the param, its version is visiable to all shares. If the training starts from scrath, the params are initialzed using random distributions, e.g., Gaussian distribution. After that, the worker may train for a couple of steps to warmup the params before put them to servers (warmup of JobProto controls this).

If the owner param is availabel from checkpoint file, then its values are parsed from the checkpoint file instead of randomly initialized. For params who do not have checkpoints, randomly init them.

int singa::Worker::Put ( Param param,
int  step 
)

Put Param to server.

Parameters
param
stepused as current param version for the put request
void singa::Worker::Report ( const string &  prefix,
const Metric perf 
)

Report performance to the stub.

Parameters
prefixdisplay prefix, e.g., 'Train', 'Test'
perf
void singa::Worker::Run ( )

Main function of Worker.

Train the neuralnet step by step, test/validation is done periodically.

void singa::Worker::Test ( int  nsteps,
Phase  phase,
shared_ptr< NeuralNet net 
)

Test the perforance of the learned model on validation or test dataset.

Test is done by the first group.

Parameters
net,neuralnetwork
bool singa::Worker::TestNow ( int  step) const
inline

Check is it time to do test.

Parameters
stepthe ::Train() has been called this num times.
virtual void singa::Worker::TrainOneBatch ( int  step,
Metric perf 
)
pure virtual

Train one mini-batch.

Test/Validation is done before training.

Implemented in singa::CDWorker, and singa::BPWorker.

int singa::Worker::Update ( Param param,
int  step 
)

Update Param.

Parameters
param
steptraining step used for updating (e.g., deciding learning rate)
bool singa::Worker::ValidateNow ( int  step) const
inline

Check is it time to do validation.

Parameters
stepthe ::Train() has been called step times.

The documentation for this class was generated from the following file: