Optimizer¶
This module includes a set of optimizers for updating model parameters.
Example usage:
from singa import optimizer
from singa import tensor
sgd = optimizer.SGD(lr=0.01, momentum=0.9, weight_decay=1e-4)
p = tensor.Tensor((3,5))
p.uniform(-1, 1)
g = tensor.Tensor((3,5))
g.gaussian(0, 0.01)
sgd.apply(1, g, p, 'param') # use the global lr=0.1 for epoch 1
sgd.apply_with_lr(2, 0.03, g, p, 'param') # use lr=0.03 for epoch 2
-
class
singa.optimizer.
Optimizer
(lr=None, momentum=None, weight_decay=None, regularizer=None, constraint=None)¶ Bases:
object
The base python optimizer class.
Typically, an optimizer is used as follows:
construct the optimizer
(optional) register each parameter with its specs.
use the optimizer to update parameter values given parameter gradients and other optional info
The subclasses should override the apply_with_lr function to do the real parameter udpate.
- Parameters
lr (float) – a constant value for the learning rate
momentum (float) – a constant value for the momentum value
weight_decay (float) – the coefficent for L2 regularizer, which is mutually exclusive with ‘regularizer’.
regularizer – an instance of Regularizer or RegularizerConf; If set, regularization would be applied in apply_with_lr(). Users can also do regularization outside.
constraint – an instance of Constraint or ConstraintConf; If set, constraint would be applied inside apply_with_lr(). Users can also apply constraint outside.
-
register
(name, specs)¶ Register the param specs, including creating regularizer and constraint per param object. Param specific regularizer and constraint have higher priority than the global ones. If all parameters share the same setting for learning rate, regularizer and constraint, then there is no need to call this function.
- Parameters
name (str) – parameter name
specs (ParamSpec) – protobuf obj, including regularizer and constraint, multipliers for learning rate and weight decay.
-
apply_regularizer_constraint
(epoch, value, grad, name=None, step=-1)¶ Apply regularization and constraint if available.
If there are both global regularizer (constraint) and param specific regularizer (constraint), it would use the param specific one.
-
apply_with_lr
(epoch, lr, grad, value, name=None, step=-1)¶ Do update of parameters with given learning rate if the grad is not empty.
The subclass optimizer must override this function. This function do nothing if the grad is empty.
- Parameters
epoch (int) – training epoch ID
lr (float) – learning rate
grad (Tensor) – parameter gradient
value (Tesnor) – parameter value
name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
step (int) – iteration ID within one epoch
- Returns
updated parameter value
-
apply
(epoch, grad, value, name=None, step=-1)¶ Do update assuming the learning rate generator is set.
The subclass optimizer does not need to override this function.
- Parameters
epoch (int) – training epoch ID
grad (Tensor) – parameter gradient
value (Tesnor) – parameter value
name (string) – paramter name to retrieval parameter specific updating rules (including regularizer and constraint)
step (int) – training iteration ID within one epoch
- Returns
updated parameter value
-
class
singa.optimizer.
SGD
(lr=None, momentum=None, weight_decay=None, regularizer=None, constraint=None)¶ Bases:
singa.optimizer.Optimizer
The vallina Stochasitc Gradient Descent algorithm with momentum.
See the base Optimizer for all arguments.
-
apply_with_lr
(epoch, lr, grad, value, name, step=-1)¶ Do update of parameters with given learning rate if the grad is not empty.
The subclass optimizer must override this function. This function do nothing if the grad is empty.
- Parameters
epoch (int) – training epoch ID
lr (float) – learning rate
grad (Tensor) – parameter gradient
value (Tesnor) – parameter value
name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
step (int) – iteration ID within one epoch
- Returns
updated parameter value
-
-
class
singa.optimizer.
Nesterov
(lr=None, momentum=0.9, weight_decay=None, regularizer=None, constraint=None)¶ Bases:
singa.optimizer.Optimizer
The SGD with Nesterov momentum.
See the base Optimizer for all arguments.
-
apply_with_lr
(epoch, lr, grad, value, name, step=-1)¶ Do update of parameters with given learning rate if the grad is not empty.
The subclass optimizer must override this function. This function do nothing if the grad is empty.
- Parameters
epoch (int) – training epoch ID
lr (float) – learning rate
grad (Tensor) – parameter gradient
value (Tesnor) – parameter value
name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
step (int) – iteration ID within one epoch
- Returns
updated parameter value
-
-
class
singa.optimizer.
RMSProp
(rho=0.9, epsilon=1e-08, lr=None, weight_decay=None, regularizer=None, constraint=None)¶ Bases:
singa.optimizer.Optimizer
RMSProp optimizer.
See the base Optimizer for all constructor args.
- Parameters
rho (float) – float within [0, 1]
epsilon (float) – small value for preventing numeric error
-
apply_with_lr
(epoch, lr, grad, value, name, step=-1)¶ Do update of parameters with given learning rate if the grad is not empty.
The subclass optimizer must override this function. This function do nothing if the grad is empty.
- Parameters
epoch (int) – training epoch ID
lr (float) – learning rate
grad (Tensor) – parameter gradient
value (Tesnor) – parameter value
name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
step (int) – iteration ID within one epoch
- Returns
updated parameter value
-
class
singa.optimizer.
AdaGrad
(epsilon=1e-08, lr=None, weight_decay=None, lr_gen=None, regularizer=None, constraint=None)¶ Bases:
singa.optimizer.Optimizer
AdaGrad optimizer.
See the base Optimizer for all constructor args.
- Parameters
epsilon (float) – small number for preventing numeric error.
-
apply_with_lr
(epoch, lr, grad, value, name, step=-1)¶ Do update of parameters with given learning rate if the grad is not empty.
The subclass optimizer must override this function. This function do nothing if the grad is empty.
- Parameters
epoch (int) – training epoch ID
lr (float) – learning rate
grad (Tensor) – parameter gradient
value (Tesnor) – parameter value
name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
step (int) – iteration ID within one epoch
- Returns
updated parameter value
-
class
singa.optimizer.
Adam
(beta_1=0.9, beta_2=0.999, epsilon=1e-08, lr=None, weight_decay=None, regularizer=None, constraint=None)¶ Bases:
singa.optimizer.Optimizer
Adam optimizer.
See the base Optimizer for all constructor args.
- Parameters
beta_1 (float) – coefficient of momentum
beta_2 (float) – coefficient of aggregated squared gradient
epsilon (float) – small value for preventing numeric error
-
apply_with_lr
(epoch, lr, grad, value, name, step)¶ Update one parameter object.
- Parameters
step (int) – the accumulated training iterations, not the iteration ID
-
class
singa.optimizer.
Regularizer
¶ Bases:
object
Base Python regularizer for parameter gradients.
-
apply
(epoch, value, grad, step=-1)¶
-
-
class
singa.optimizer.
CppRegularizer
(conf)¶ Bases:
singa.optimizer.Regularizer
Wrapper for regularizer implemented using C++.
- Parameters
conf (RegularizerConf) – protobuf message for the configuration.
-
apply
(epoch, value, grad, step=-1)¶
-
class
singa.optimizer.
L2Regularizer
(coefficient)¶ Bases:
singa.optimizer.Regularizer
L2 regularization
- Parameters
coefficient (float) – regularization coefficient.
-
apply
(epoch, value, grad, step=-1)¶
-
class
singa.optimizer.
Constraint
¶ Bases:
object
Base Python constraint class for paramter gradients
-
apply
(epoch, value, grad, step=-1)¶
-
-
class
singa.optimizer.
CppConstraint
(conf)¶ Bases:
singa.optimizer.Constraint
Wrapper for constraints implemented using C++.
- Parameters
conf (ConstraintConf) – protobuf message for the configuration.
-
apply
(epoch, value, grad, step=-1)¶
-
class
singa.optimizer.
L2Constraint
(threshold=None)¶ Bases:
singa.optimizer.Constraint
Rescale the gradient to make the L2 norm <= a given threshold
-
apply
(epoch, value, grad, step=-1)¶
-