Optimizer¶
This module includes a set of optimizers for updating model parameters.
Example usage:
from singa import optimizer
from singa import tensor
sgd = optimizer.SGD(lr=0.01, momentum=0.9, weight_decay=1e-4)
p = tensor.Tensor((3,5))
p.uniform(-1, 1)
g = tensor.Tensor((3,5))
g.gaussian(0, 0.01)
sgd.apply(1, g, p, 'param') # use the global lr=0.1 for epoch 1
sgd.apply_with_lr(2, 0.03, g, p, 'param') # use lr=0.03 for epoch 2
-
class
singa.optimizer.
Optimizer
(lr=None, momentum=None, weight_decay=None, regularizer=None, constraint=None)¶ Bases:
object
The base python optimizer class.
Typically, an optimizer is used as follows:
- construct the optimizer
- (optional) register each parameter with its specs.
- use the optimizer to update parameter values given parameter gradients and other optional info
The subclasses should override the apply_with_lr function to do the real parameter udpate.
Parameters: - lr (float) – a constant value for the learning rate
- momentum (float) – a constant value for the momentum value
- weight_decay (float) – the coefficent for L2 regularizer, which is mutually exclusive with ‘regularizer’.
- regularizer – an instance of Regularizer or RegularizerConf; If set, regularization would be applied in apply_with_lr(). Users can also do regularization outside.
- constraint – an instance of Constraint or ConstraintConf; If set, constraint would be applied inside apply_with_lr(). Users can also apply constraint outside.
-
register
(name, specs)¶ Register the param specs, including creating regularizer and constraint per param object. Param specific regularizer and constraint have higher priority than the global ones. If all parameters share the same setting for learning rate, regularizer and constraint, then there is no need to call this function.
Parameters: - name (str) – parameter name
- specs (ParamSpec) – protobuf obj, including regularizer and constraint, multipliers for learning rate and weight decay.
-
apply_regularizer_constraint
(epoch, value, grad, name=None, step=-1)¶ Apply regularization and constraint if available.
If there are both global regularizer (constraint) and param specific regularizer (constraint), it would use the param specific one.
Parameters: Returns: the updated gradient Tensor
-
apply_with_lr
(epoch, lr, grad, value, name=None, step=-1)¶ Do update of parameters with given learning rate if the grad is not empty.
The subclass optimizer must override this function. This function do nothing if the grad is empty.
Parameters: - epoch (int) – training epoch ID
- lr (float) – learning rate
- grad (Tensor) – parameter gradient
- value (Tesnor) – parameter value
- name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
- step (int) – iteration ID within one epoch
Returns: updated parameter value
-
apply
(epoch, grad, value, name=None, step=-1)¶ Do update assuming the learning rate generator is set.
The subclass optimizer does not need to override this function.
Parameters: - epoch (int) – training epoch ID
- grad (Tensor) – parameter gradient
- value (Tesnor) – parameter value
- name (string) – paramter name to retrieval parameter specific updating rules (including regularizer and constraint)
- step (int) – training iteration ID within one epoch
Returns: updated parameter value
-
class
singa.optimizer.
SGD
(lr=None, momentum=None, weight_decay=None, regularizer=None, constraint=None)¶ Bases:
singa.optimizer.Optimizer
The vallina Stochasitc Gradient Descent algorithm with momentum.
See the base Optimizer for all arguments.
-
apply_with_lr
(epoch, lr, grad, value, name, step=-1)¶
-
-
class
singa.optimizer.
Nesterov
(lr=None, momentum=0.9, weight_decay=None, regularizer=None, constraint=None)¶ Bases:
singa.optimizer.Optimizer
The SGD with Nesterov momentum.
See the base Optimizer for all arguments.
-
apply_with_lr
(epoch, lr, grad, value, name, step=-1)¶
-
-
class
singa.optimizer.
RMSProp
(rho=0.9, epsilon=1e-08, lr=None, weight_decay=None, regularizer=None, constraint=None)¶ Bases:
singa.optimizer.Optimizer
RMSProp optimizer.
See the base Optimizer for all constructor args.
Parameters: - rho (float) – float within [0, 1]
- epsilon (float) – small value for preventing numeric error
-
apply_with_lr
(epoch, lr, grad, value, name, step=-1)¶
-
class
singa.optimizer.
AdaGrad
(epsilon=1e-08, lr=None, weight_decay=None, lr_gen=None, regularizer=None, constraint=None)¶ Bases:
singa.optimizer.Optimizer
AdaGrad optimizer.
See the base Optimizer for all constructor args.
Parameters: epsilon (float) – small number for preventing numeric error. -
apply_with_lr
(epoch, lr, grad, value, name, step=-1)¶
-
-
class
singa.optimizer.
Adam
(beta_1=0.9, beta_2=0.999, epsilon=1e-08, lr=None, weight_decay=None, regularizer=None, constraint=None)¶ Bases:
singa.optimizer.Optimizer
Adam optimizer.
See the base Optimizer for all constructor args.
Parameters: - beta_1 (float) – coefficient of momentum
- beta_2 (float) – coefficient of aggregated squared gradient
- epsilon (float) – small value for preventing numeric error
-
apply_with_lr
(epoch, lr, grad, value, name, step)¶ Update one parameter object.
Parameters: step (int) – the accumulated training iterations, not the iteration ID
-
class
singa.optimizer.
Regularizer
¶ Bases:
object
Base Python regularizer for parameter gradients.
-
apply
(epoch, value, grad, step=-1)¶
-
-
class
singa.optimizer.
CppRegularizer
(conf)¶ Bases:
singa.optimizer.Regularizer
Wrapper for regularizer implemented using C++.
Parameters: conf (RegularizerConf) – protobuf message for the configuration. -
apply
(epoch, value, grad, step=-1)¶
-
-
class
singa.optimizer.
L2Regularizer
(coefficient)¶ Bases:
singa.optimizer.Regularizer
L2 regularization
Parameters: coefficient (float) – regularization coefficient. -
apply
(epoch, value, grad, step=-1)¶
-
-
class
singa.optimizer.
Constraint
¶ Bases:
object
Base Python constraint class for paramter gradients
-
apply
(epoch, value, grad, step=-1)¶
-
-
class
singa.optimizer.
CppConstraint
(conf)¶ Bases:
singa.optimizer.Constraint
Wrapper for constraints implemented using C++.
Parameters: conf (ConstraintConf) – protobuf message for the configuration. -
apply
(epoch, value, grad, step=-1)¶
-
-
class
singa.optimizer.
L2Constraint
(threshold=None)¶ Bases:
singa.optimizer.Constraint
Rescale the gradient to make the L2 norm <= a given threshold
-
apply
(epoch, value, grad, step=-1)¶
-