Updater¶
Every server in SINGA has an Updater instance that updates parameters based on gradients. In this page, the Basic user guide describes the configuration of an updater. The Advanced user guide present details on how to implement a new updater and a new learning rate changing method.
Basic user guide¶
There are many different parameter updating protocols (i.e., subclasses of
Updater
). They share some configuration fields like
type
, an integer for identifying an updater;learning_rate
, configuration for the LRGenerator which controls the learning rate.weight_decay
, the co-efficient for L2 * regularization.- momentum.
If you are not familiar with the above terms, you can get their meanings in this page provided by Karpathy.
Configuration of built-in updater classes¶
Updater¶
The base Updater
implements the vanilla SGD algorithm.
Its configuration type is kSGD
.
Users need to configure at least the learning_rate
field.
momentum
and weight_decay
are optional fields.
updater{
type: kSGD
momentum: float
weight_decay: float
learning_rate {
...
}
}
AdaGradUpdater¶
It inherits the base Updater
to implement the
AdaGrad algorithm.
Its type is kAdaGrad
.
AdaGradUpdater
is configured similar to Updater
except
that momentum
is not used.
NesterovUpdater¶
It inherits the base Updater
to implements the
Nesterov (section 3.5) updating protocol.
Its type is kNesterov
.
learning_rate
and momentum
must be configured. weight_decay
is an
optional configuration field.
RMSPropUpdater¶
It inherits the base Updater
to implements the
RMSProp algorithm proposed by
Hinton(slide 29).
Its type is kRMSProp
.
updater {
type: kRMSProp
rmsprop_conf {
rho: float # [0,1]
}
}
AdaDeltaUpdater¶
It inherits the base Updater
to implements the
AdaDelta updating algorithm.
Its type is kAdaDelta
.
updater {
type: kAdaDelta
adadelta_conf {
rho: float # [0,1]
}
}
Configuration of learning rate¶
The learning_rate
field is configured as,
learning_rate {
type: ChangeMethod
base_lr: float # base/initial learning rate
... # fields to a specific changing method
}
The common fields include type
and base_lr
. SINGA provides the following
ChangeMethod
s.
kFixed¶
The base_lr
is used for all steps.
kLinear¶
The updater should be configured like
learning_rate {
base_lr: float
linear_conf {
freq: int
final_lr: float
}
}
Linear interpolation is used to change the learning rate,
lr = (1 - step / freq) * base_lr + (step / freq) * final_lr
kExponential¶
The udapter should be configured like
learning_rate {
base_lr: float
exponential_conf {
freq: int
}
}
The learning rate for step
is
lr = base_lr / 2^(step / freq)
kInverseT¶
The updater should be configured like
learning_rate {
base_lr: float
inverset_conf {
final_lr: float
}
}
The learning rate for step
is
lr = base_lr / (1 + step / final_lr)
kInverse¶
The updater should be configured like
learning_rate {
base_lr: float
inverse_conf {
gamma: float
pow: float
}
}
The learning rate for step
is
lr = base_lr * (1 + gamma * setp)^(-pow)
kStep¶
The updater should be configured like
learning_rate {
base_lr : float
step_conf {
change_freq: int
gamma: float
}
}
The learning rate for step
is
lr = base_lr * gamma^ (step / change_freq)
kFixedStep¶
The updater should be configured like
learning_rate {
fixedstep_conf {
step: int
step_lr: float
step: int
step_lr: float
...
}
}
Denote the i-th tuple as (step[i], step_lr[i]), then the learning rate for
step
is,
step_lr[k]
where step[k] is the smallest number that is larger than step
.
Advanced user guide¶
Implementing a new Updater subclass¶
The base Updater class has one virtual function,
class Updater{
public:
virtual void Update(int step, Param* param, float grad_scale = 1.0f) = 0;
protected:
UpdaterProto proto_;
LRGenerator lr_gen_;
};
It updates the values of the param
based on its gradients. The step
argument is for deciding the learning rate which may change through time
(step). grad_scale
scales the original gradient values. This function is
called by servers once it receives all gradients for the same Param
object.
To implement a new Updater subclass, users must override the Update
function.
class FooUpdater : public Updater {
void Update(int step, Param* param, float grad_scale = 1.0f) override;
};
Configuration of this new updater can be declared similar to that of a new layer,
# in user.proto
FooUpdaterProto {
optional int32 c = 1;
}
extend UpdaterProto {
optional FooUpdaterProto fooupdater_conf= 101;
}
The new updater should be registered in the main function
driver.RegisterUpdater<FooUpdater>("FooUpdater");
Users can then configure the job as
# in job.conf
updater {
user_type: "FooUpdater" # must use user_type with the same string identifier as the one used for registration
fooupdater_conf {
c : 20;
}
}
Implementing a new LRGenerator subclass¶
The base LRGenerator
is declared as,
virtual float Get(int step);
To implement a subclass, e.g., FooLRGen
, users should declare it like
class FooLRGen : public LRGenerator {
public:
float Get(int step) override;
};
Configuration of FooLRGen
can be defined using a protocol message,
# in user.proto
message FooLRProto {
...
}
extend LRGenProto {
optional FooLRProto foolr_conf = 101;
}
The configuration is then like,
learning_rate {
user_type : "FooLR" # must use user_type with the same string identifier as the one used for registration
base_lr: float
foolr_conf {
...
}
}
Users have to register this subclass in the main function,
driver.RegisterLRGenerator<FooLRGen, std::string>("FooLR")