Optimizer¶

This module includes a set of optimizers for updating model parameters.

Example usage:

from singa import optimizer
from singa import tensor

sgd = optimizer.SGD(lr=0.01, momentum=0.9, weight_decay=1e-4)
p = tensor.Tensor((3,5))
p.uniform(-1, 1)
g = tensor.Tensor((3,5))
g.gaussian(0, 0.01)

sgd.apply(1, g, p, 'param')  # use the global lr=0.1 for epoch 1
sgd.apply_with_lr(2, 0.03, g, p, 'param')  # use lr=0.03 for epoch 2

class singa.optimizer.Optimizer(lr=None, momentum=None, weight_decay=None, regularizer=None, constraint=None)¶

Bases: object

The base python optimizer class.

Typically, an optimizer is used as follows:

construct the optimizer
(optional) register each parameter with its specs.
use the optimizer to update parameter values given parameter gradients and other optional info

The subclasses should override the apply_with_lr function to do the real parameter udpate.

Parameters

lr (float) – a constant value for the learning rate
momentum (float) – a constant value for the momentum value
weight_decay (float) – the coefficent for L2 regularizer, which is mutually exclusive with ‘regularizer’.
regularizer – an instance of Regularizer or RegularizerConf; If set, regularization would be applied in apply_with_lr(). Users can also do regularization outside.
constraint – an instance of Constraint or ConstraintConf; If set, constraint would be applied inside apply_with_lr(). Users can also apply constraint outside.

register(name, specs)¶

Register the param specs, including creating regularizer and constraint per param object. Param specific regularizer and constraint have higher priority than the global ones. If all parameters share the same setting for learning rate, regularizer and constraint, then there is no need to call this function.

Parameters

name (str) – parameter name
specs (ParamSpec) – protobuf obj, including regularizer and constraint, multipliers for learning rate and weight decay.

apply_regularizer_constraint(epoch, value, grad, name=None, step=-1)¶

Apply regularization and constraint if available.

If there are both global regularizer (constraint) and param specific regularizer (constraint), it would use the param specific one.

Parameters

epoch (int) – training epoch ID
value (Tensor) – parameter value Tensor
grad (Tensor) – parameter gradient Tensor
name (string) – to get parameter specific regularizer or constraint
step (int) – iteration ID within one epoch

Returns

the updated gradient Tensor

apply_with_lr(epoch, lr, grad, value, name=None, step=-1)¶

Do update of parameters with given learning rate if the grad is not empty.

The subclass optimizer must override this function. This function do nothing if the grad is empty.

Parameters

epoch (int) – training epoch ID
lr (float) – learning rate
grad (Tensor) – parameter gradient
value (Tesnor) – parameter value
name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
step (int) – iteration ID within one epoch

Returns

updated parameter value

apply(epoch, grad, value, name=None, step=-1)¶

Do update assuming the learning rate generator is set.

The subclass optimizer does not need to override this function.

Parameters

epoch (int) – training epoch ID
grad (Tensor) – parameter gradient
value (Tesnor) – parameter value
name (string) – paramter name to retrieval parameter specific updating rules (including regularizer and constraint)
step (int) – training iteration ID within one epoch

Returns

updated parameter value

class singa.optimizer.SGD(lr=None, momentum=None, weight_decay=None, regularizer=None, constraint=None)¶

Bases: singa.optimizer.Optimizer

The vallina Stochasitc Gradient Descent algorithm with momentum.

See the base Optimizer for all arguments.

apply_with_lr(epoch, lr, grad, value, name, step=-1)¶

Do update of parameters with given learning rate if the grad is not empty.

The subclass optimizer must override this function. This function do nothing if the grad is empty.

Parameters

epoch (int) – training epoch ID
lr (float) – learning rate
grad (Tensor) – parameter gradient
value (Tesnor) – parameter value
name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
step (int) – iteration ID within one epoch

Returns

updated parameter value

class singa.optimizer.Nesterov(lr=None, momentum=0.9, weight_decay=None, regularizer=None, constraint=None)¶

Bases: singa.optimizer.Optimizer

The SGD with Nesterov momentum.

See the base Optimizer for all arguments.

apply_with_lr(epoch, lr, grad, value, name, step=-1)¶

Do update of parameters with given learning rate if the grad is not empty.

The subclass optimizer must override this function. This function do nothing if the grad is empty.

Parameters

epoch (int) – training epoch ID
lr (float) – learning rate
grad (Tensor) – parameter gradient
value (Tesnor) – parameter value
name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
step (int) – iteration ID within one epoch

Returns

updated parameter value

class singa.optimizer.RMSProp(rho=0.9, epsilon=1e-08, lr=None, weight_decay=None, regularizer=None, constraint=None)¶

Bases: singa.optimizer.Optimizer

RMSProp optimizer.

See the base Optimizer for all constructor args.

Parameters

rho (float) – float within [0, 1]
epsilon (float) – small value for preventing numeric error

apply_with_lr(epoch, lr, grad, value, name, step=-1)¶

Do update of parameters with given learning rate if the grad is not empty.

The subclass optimizer must override this function. This function do nothing if the grad is empty.

Parameters

epoch (int) – training epoch ID
lr (float) – learning rate
grad (Tensor) – parameter gradient
value (Tesnor) – parameter value
name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
step (int) – iteration ID within one epoch

Returns

updated parameter value

class singa.optimizer.AdaGrad(epsilon=1e-08, lr=None, weight_decay=None, lr_gen=None, regularizer=None, constraint=None)¶

Bases: singa.optimizer.Optimizer

AdaGrad optimizer.

See the base Optimizer for all constructor args.

Parameters: epsilon (float) – small number for preventing numeric error.

apply_with_lr(epoch, lr, grad, value, name, step=-1)¶

Do update of parameters with given learning rate if the grad is not empty.

The subclass optimizer must override this function. This function do nothing if the grad is empty.

Parameters

epoch (int) – training epoch ID
lr (float) – learning rate
grad (Tensor) – parameter gradient
value (Tesnor) – parameter value
name (string) – paramter name to index parameter specific updating rules (including regularizer and constraint)
step (int) – iteration ID within one epoch

Returns

updated parameter value

class singa.optimizer.Adam(beta_1=0.9, beta_2=0.999, epsilon=1e-08, lr=None, weight_decay=None, regularizer=None, constraint=None)¶

Bases: singa.optimizer.Optimizer

Adam optimizer.

See the base Optimizer for all constructor args.

Parameters

beta_1 (float) – coefficient of momentum
beta_2 (float) – coefficient of aggregated squared gradient
epsilon (float) – small value for preventing numeric error

apply_with_lr(epoch, lr, grad, value, name, step)¶

Update one parameter object.

Parameters: step (int) – the accumulated training iterations, not the iteration ID

class singa.optimizer.Regularizer¶

Bases: object

Base Python regularizer for parameter gradients.

apply(epoch, value, grad, step=-1)¶

class singa.optimizer.CppRegularizer(conf)¶

Bases: singa.optimizer.Regularizer

Wrapper for regularizer implemented using C++.

Parameters: conf (RegularizerConf) – protobuf message for the configuration.

apply(epoch, value, grad, step=-1)¶

class singa.optimizer.L2Regularizer(coefficient)¶

Bases: singa.optimizer.Regularizer

L2 regularization

Parameters: coefficient (float) – regularization coefficient.

apply(epoch, value, grad, step=-1)¶

class singa.optimizer.Constraint¶

Bases: object

Base Python constraint class for paramter gradients

apply(epoch, value, grad, step=-1)¶

class singa.optimizer.CppConstraint(conf)¶

Bases: singa.optimizer.Constraint

Wrapper for constraints implemented using C++.

Parameters: conf (ConstraintConf) – protobuf message for the configuration.

apply(epoch, value, grad, step=-1)¶

class singa.optimizer.L2Constraint(threshold=None)¶

Bases: singa.optimizer.Constraint

Rescale the gradient to make the L2 norm <= a given threshold

apply(epoch, value, grad, step=-1)¶