The Worker class which runs the training algorithm. More...

#include <worker.h>

Inheritance diagram for singa::Worker:

Public Member Functions
virtual void	Init (int thread_id, int grp_id, int id)

void	Setup (const JobProto &job, shared_ptr< NeuralNet > train_net, shared_ptr< NeuralNet > valid_net, shared_ptr< NeuralNet > test_net)
	Setup members.

void	Run ()
	Main function of Worker. More...

void	InitLocalParams ()
	Init all local params (i.e., params from layers resident in this worker). More...

void	Checkpoint (int step, shared_ptr< NeuralNet > net)
	Checkpoint all params owned by the worker from the first group onto disk. More...

void	Test (int nsteps, Phase phase, shared_ptr< NeuralNet > net)
	Test the perforance of the learned model on validation or test dataset. More...

virtual void	TrainOneBatch (int step, Metric *perf)=0
	Train one mini-batch. More...

virtual void	TestOneBatch (int step, Phase phase, shared_ptr< NeuralNet > net, Metric *perf)=0
	Test/validate one mini-batch.

void	Report (const string &prefix, const Metric &perf)
	Report performance to the stub. More...

int	Put (Param *param, int step)
	Put Param to server. More...

int	Get (Param *param, int step)
	Get Param with specific version from server If the current version >= the requested version, then return. More...

int	Update (Param *param, int step)
	Update Param. More...

int	Collect (Param *param, int step)
	Block until the param is updated since sending the update request. More...

int	CollectAll (shared_ptr< NeuralNet > net, int step)
	Call Collect for every param of net.

void	ReceiveBlobs (bool data, bool grad, BridgeLayer *layer, shared_ptr< NeuralNet > net)
	Receive blobs from other workers due to model partitions.

void	SendBlobs (bool data, bool grad, BridgeLayer *layer, shared_ptr< NeuralNet > net)
	Send blobs to other workers due to model partitions.

bool	DisplayNow (int step) const
	Check is it time to display training info, e.g., loss and precison.

bool	DisplayDebugInfo (int step) const
	Check is it time to display training info, e.g., loss and precison.

bool	StopNow (int step) const
	Check is it time to stop.

bool	CheckpointNow (int step) const
	Check is it time to do checkpoint.

bool	TestNow (int step) const
	Check is it time to do test. More...

bool	ValidateNow (int step) const
	Check is it time to do validation. More...

int	grp_id () const

int	id () const
	worker ID within the worker group.

Static Public Member Functions
static Worker *	Create (const JobProto &proto)

Protected Attributes
int	thread_id_

int	grp_id_

int	id_

int	step_

JobProto	job_conf_

shared_ptr< NeuralNet >	train_net_

shared_ptr< NeuralNet >	test_net_

shared_ptr< NeuralNet >	validation_net_

Dealer *	layer_dealer_

Dealer *	dealer_

Detailed Description

The Worker class which runs the training algorithm.

The first worker group will initialize parameters of the Net, and put them into the distributed memory/table. The virtual function TrainOneBatch and TestOneBatch implement the training and test algorithm for one mini-batch data.

Child workers override the two functions to implement their training algorithms, e.g., the BPWorker/CDWorker/BPTTWorker implements the BP/CD/BPTT algorithm respectively.

Member Function Documentation

void singa::Worker::Checkpoint	(	int	step,
		shared_ptr< NeuralNet >	net
	)

Checkpoint all params owned by the worker from the first group onto disk.

The serialization is done using BlobProtos which includes the name, version and values of each Param. Different worker would generate different checkpoint files. The file path is <workspace>/checkpoint-<jobname>-step<step>-worker<worker_id>.bin

Parameters

step	training step of this worker
net	the training net whose params will be dumped.

int singa::Worker::Collect	(	Param *	param,
		int	step
	)

Block until the param is updated since sending the update request.

Parameters

param
step	not used

int singa::Worker::Get	(	Param *	param,
		int	step
	)

Get Param with specific version from server If the current version >= the requested version, then return.

Otherwise send a get request to stub who would forwards it to servers.

Parameters

param
step	requested param version

int singa::Worker::grp_id ( ) const

inline

Returns: group ID

virtual void singa::Worker::Init	(	int	thread_id,
		int	grp_id,
		int	id
	)

virtual

Parameters

thread_id	local thread index within the procs
grp_id	global worker group ID
id	worker ID within the group

Reimplemented in singa::BPWorker.

void singa::Worker::InitLocalParams ( )

Init all local params (i.e., params from layers resident in this worker).

If the param is owned by the worker, then init it and put it to servers. Otherwise call Get() to get the param. The Get may not send get request. Because the param's own is in the same procs. Once the owner initializes the param, its version is visiable to all shares. If the training starts from scrath, the params are initialzed using random distributions, e.g., Gaussian distribution. After that, the worker may train for a couple of steps to warmup the params before put them to servers (warmup of JobProto controls this).

If the owner param is availabel from checkpoint file, then its values are parsed from the checkpoint file instead of randomly initialized. For params who do not have checkpoints, randomly init them.

int singa::Worker::Put	(	Param *	param,
		int	step
	)

Put Param to server.

Parameters

param
step	used as current param version for the put request

void singa::Worker::Report	(	const string &	prefix,
		const Metric &	perf
	)

Report performance to the stub.

Parameters

prefix	display prefix, e.g., 'Train', 'Test'
perf

void singa::Worker::Run ( )

Main function of Worker.

Train the neuralnet step by step, test/validation is done periodically.

void singa::Worker::Test	(	int	nsteps,
		Phase	phase,
		shared_ptr< NeuralNet >	net
	)

Test the perforance of the learned model on validation or test dataset.

Test is done by the first group.

Parameters

net,neural network

bool singa::Worker::TestNow ( int step ) const

inline

Check is it time to do test.

Parameters

step	the ::Train() has been called this num times.

virtual void singa::Worker::TrainOneBatch	(	int	step,
		Metric *	perf
	)

pure virtual

Train one mini-batch.

Test/Validation is done before training.

Implemented in singa::CDWorker, and singa::BPWorker.

int singa::Worker::Update	(	Param *	param,
		int	step
	)

Update Param.

Parameters

param
step	training step used for updating (e.g., deciding learning rate)

bool singa::Worker::ValidateNow ( int step ) const

inline

Check is it time to do validation.

Parameters

step	the ::Train() has been called step times.

The documentation for this class was generated from the following file:

/home/wangwei/program/asf/release-0.1/apache-singa-incubating-0.1.0-RC1/include/trainer/worker.h

Public Member Functions

Static Public Member Functions

Protected Attributes

Detailed Description

Member Function Documentation