Optimizer

This module includes a set of optimizers for updating model parameters.

Example usage:

from singa import optimizer
from singa import tensor

sgd = optimizer.SGD(lr=0.01, momentum=0.9, weight_decay=1e-4)
p = tensor.Tensor((3,5))
p.uniform(-1, 1)
g = tensor.Tensor((3,5))
g.gaussian(0, 0.01)

sgd.apply(1, g, p, 'param')  # use the global lr=0.1 for epoch 1
sgd.apply_with_lr(2, 0.03, g, p, 'param')  # use lr=0.03 for epoch 2
class singa.optimizer.Optimizer(lr=None, momentum=None, weight_decay=None, regularizer=None, constraint=None)

Bases: object

The base python optimizer class.

Typically, an optimizer is used as follows:

  1. construct the optimizer
  2. (optional) register each parameter with its specs.
  3. use the optimizer to update parameter values given parameter gradients and other optional info

The subclasses should override the apply_with_lr function to do the real parameter udpate.

Parameters:
  • lr (float) – a constant value for the learning rate
  • momentum (float) – a constant value for the momentum value
  • weight_decay (float) – the coefficent for L2 regularizer, which is mutually exclusive with ‘regularizer’.
  • regularizer – an instance of Regularizer or RegularizerConf; If set, regularization would be applied in apply_with_lr(). Users can also do regularization outside.
  • constraint – an instance of Constraint or ConstraintConf; If set, constraint would be applied inside apply_with_lr(). Users can also apply constraint outside.
register(name, specs)

Register the param specs, including creating regularizer and constraint per param object. Param specific regularizer and constraint have higher priority than the global ones. If all parameters share the same setting for learning rate, regularizer and constraint, then there is no need to call this function.

Parameters:
  • name (str) – parameter name
  • specs (ParamSpec) – protobuf obj, including regularizer and constraint, multipliers for learning rate and weight decay.
apply_regularizer_constraint(epoch, value, grad, name=None, step=-1)

Apply regularization and constraint if available.

If there are both global regularizer (constraint) and param specific regularizer (constraint), it would use the param specific one.

Parameters:
  • epoch (int) – training epoch ID
  • value (Tensor) – parameter value Tensor
  • grad (Tensor) – parameter gradient Tensor
  • name (string) – to get parameter specific regularizer or constraint
  • step (int) – iteration ID within one epoch
Returns:

the updated gradient Tensor

apply_with_lr(epoch, lr, grad, value, name=None, step=-1)

Do update of parameters with given learning rate if the grad is not empty.

The subclass optimizer must override this function. This function do nothing if the grad is empty.

Parameters:
  • epoch (int) – training epoch ID
  • lr (float) – learning rate
  • grad (Tensor) – parameter gradient
  • value (Tesnor) – parameter value
  • name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
  • step (int) – iteration ID within one epoch
Returns:

updated parameter value

apply(epoch, grad, value, name=None, step=-1)

Do update assuming the learning rate generator is set.

The subclass optimizer does not need to override this function.

Parameters:
  • epoch (int) – training epoch ID
  • grad (Tensor) – parameter gradient
  • value (Tesnor) – parameter value
  • name (string) – paramter name to retrieval parameter specific updating rules (including regularizer and constraint)
  • step (int) – training iteration ID within one epoch
Returns:

updated parameter value

class singa.optimizer.SGD(lr=None, momentum=None, weight_decay=None, regularizer=None, constraint=None)

Bases: singa.optimizer.Optimizer

The vallina Stochasitc Gradient Descent algorithm with momentum.

See the base Optimizer for all arguments.

apply_with_lr(epoch, lr, grad, value, name, step=-1)

Do update of parameters with given learning rate if the grad is not empty.

The subclass optimizer must override this function. This function do nothing if the grad is empty.

Parameters:
  • epoch (int) – training epoch ID
  • lr (float) – learning rate
  • grad (Tensor) – parameter gradient
  • value (Tesnor) – parameter value
  • name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
  • step (int) – iteration ID within one epoch
Returns:

updated parameter value

class singa.optimizer.Nesterov(lr=None, momentum=0.9, weight_decay=None, regularizer=None, constraint=None)

Bases: singa.optimizer.Optimizer

The SGD with Nesterov momentum.

See the base Optimizer for all arguments.

apply_with_lr(epoch, lr, grad, value, name, step=-1)

Do update of parameters with given learning rate if the grad is not empty.

The subclass optimizer must override this function. This function do nothing if the grad is empty.

Parameters:
  • epoch (int) – training epoch ID
  • lr (float) – learning rate
  • grad (Tensor) – parameter gradient
  • value (Tesnor) – parameter value
  • name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
  • step (int) – iteration ID within one epoch
Returns:

updated parameter value

class singa.optimizer.RMSProp(rho=0.9, epsilon=1e-08, lr=None, weight_decay=None, regularizer=None, constraint=None)

Bases: singa.optimizer.Optimizer

RMSProp optimizer.

See the base Optimizer for all constructor args.

Parameters:
  • rho (float) – float within [0, 1]
  • epsilon (float) – small value for preventing numeric error
apply_with_lr(epoch, lr, grad, value, name, step=-1)

Do update of parameters with given learning rate if the grad is not empty.

The subclass optimizer must override this function. This function do nothing if the grad is empty.

Parameters:
  • epoch (int) – training epoch ID
  • lr (float) – learning rate
  • grad (Tensor) – parameter gradient
  • value (Tesnor) – parameter value
  • name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
  • step (int) – iteration ID within one epoch
Returns:

updated parameter value

class singa.optimizer.AdaGrad(epsilon=1e-08, lr=None, weight_decay=None, lr_gen=None, regularizer=None, constraint=None)

Bases: singa.optimizer.Optimizer

AdaGrad optimizer.

See the base Optimizer for all constructor args.

Parameters:epsilon (float) – small number for preventing numeric error.
apply_with_lr(epoch, lr, grad, value, name, step=-1)

Do update of parameters with given learning rate if the grad is not empty.

The subclass optimizer must override this function. This function do nothing if the grad is empty.

Parameters:
  • epoch (int) – training epoch ID
  • lr (float) – learning rate
  • grad (Tensor) – parameter gradient
  • value (Tesnor) – parameter value
  • name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
  • step (int) – iteration ID within one epoch
Returns:

updated parameter value

class singa.optimizer.Adam(beta_1=0.9, beta_2=0.999, epsilon=1e-08, lr=None, weight_decay=None, regularizer=None, constraint=None)

Bases: singa.optimizer.Optimizer

Adam optimizer.

See the base Optimizer for all constructor args.

Parameters:
  • beta_1 (float) – coefficient of momentum
  • beta_2 (float) – coefficient of aggregated squared gradient
  • epsilon (float) – small value for preventing numeric error
apply_with_lr(epoch, lr, grad, value, name, step)

Update one parameter object.

Parameters:step (int) – the accumulated training iterations, not the iteration ID
class singa.optimizer.Regularizer

Bases: object

Base Python regularizer for parameter gradients.

apply(epoch, value, grad, step=-1)
class singa.optimizer.CppRegularizer(conf)

Bases: singa.optimizer.Regularizer

Wrapper for regularizer implemented using C++.

Parameters:conf (RegularizerConf) – protobuf message for the configuration.
apply(epoch, value, grad, step=-1)
class singa.optimizer.L2Regularizer(coefficient)

Bases: singa.optimizer.Regularizer

L2 regularization

Parameters:coefficient (float) – regularization coefficient.
apply(epoch, value, grad, step=-1)
class singa.optimizer.Constraint

Bases: object

Base Python constraint class for paramter gradients

apply(epoch, value, grad, step=-1)
class singa.optimizer.CppConstraint(conf)

Bases: singa.optimizer.Constraint

Wrapper for constraints implemented using C++.

Parameters:conf (ConstraintConf) – protobuf message for the configuration.
apply(epoch, value, grad, step=-1)
class singa.optimizer.L2Constraint(threshold=None)

Bases: singa.optimizer.Constraint

Rescale the gradient to make the L2 norm <= a given threshold

apply(epoch, value, grad, step=-1)