# Updater

---

Every server in SINGA has an [Updater](../api/classsinga_1_1Updater.html)
instance that updates parameters based on gradients.
In this page, the *Basic user guide* describes the configuration of an updater.
The *Advanced user guide* present details on how to implement a new updater and a new
learning rate changing method.

## Basic user guide

There are many different parameter updating protocols (i.e., subclasses of
`Updater`). They share some configuration fields like

* `type`, an integer for identifying an updater;
* `learning_rate`, configuration for the
[LRGenerator](../api/classsinga_1_1LRGenerator.html) which controls the learning rate.
* `weight_decay`, the co-efficient for [L2 * regularization](http://deeplearning.net/tutorial/gettingstarted.html#regularization).
* [momentum](http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/).

If you are not familiar with the above terms, you can get their meanings in
[this page provided by Karpathy](http://cs231n.github.io/neural-networks-3/#update).

### Configuration of built-in updater classes

#### Updater
The base `Updater` implements the [vanilla SGD algorithm](http://cs231n.github.io/neural-networks-3/#sgd).
Its configuration type is `kSGD`.
Users need to configure at least the `learning_rate` field.
`momentum` and `weight_decay` are optional fields.

    updater{
      type: kSGD
      momentum: float
      weight_decay: float
      learning_rate {
        ...
      }
    }

#### AdaGradUpdater

It inherits the base `Updater` to implement the
[AdaGrad](http://www.magicbroom.info/Papers/DuchiHaSi10.pdf) algorithm.
Its type is `kAdaGrad`.
`AdaGradUpdater` is configured similar to `Updater` except
that `momentum` is not used.

#### NesterovUpdater

It inherits the base `Updater` to implements the
[Nesterov](http://arxiv.org/pdf/1212.0901v2.pdf) (section 3.5) updating protocol.
Its type is `kNesterov`.
`learning_rate` and `momentum` must be configured. `weight_decay` is an
optional configuration field.

#### RMSPropUpdater

It inherits the base `Updater` to implements the
[RMSProp algorithm](http://cs231n.github.io/neural-networks-3/#sgd) proposed by
[Hinton](http://www.cs.toronto.edu/%7Etijmen/csc321/slides/lecture_slides_lec6.pdf)(slide 29).
Its type is `kRMSProp`.

    updater {
      type: kRMSProp
      rmsprop_conf {
       rho: float # [0,1]
      }
    }

#### AdaDeltaUpdater

It inherits the base `Updater` to implements the
[AdaDelta](http://arxiv.org/abs/1212.5701) updating algorithm.
Its type is `kAdaDelta`.

    updater {
      type: kAdaDelta
      adadelta_conf {
       rho: float # [0,1]
      }
    }

#### Adam

It inherits the base `Updater` to implements the
[Adam](http://arxiv.org/pdf/1412.6980.pdf) updating algorithm.
Its type is `kAdam`.
`beta1` and `beta2` is floats, 0 < `beta` < 1, generally close to 1.

    updater {
      type: kAdam
      adam_conf {
       beta1: float # [0,1]
       beta2: float # [0,1]
      }
    }

#### AdaMax

It inherits the base `Updater` to implements the
[AdaMax](http://arxiv.org/pdf/1412.6980.pdf) updating algorithm.
Its type is `kAdamMax`.
`beta1` and `beta2` is floats, 0 < `beta` < 1, generally close to 1.

    updater {
      type: kAdamMax
      adammax_conf {
       beta1: float # [0,1]
       beta2: float # [0,1]
      }
    }

### Configuration of learning rate

The `learning_rate` field is configured as,

    learning_rate {
      type: ChangeMethod
      base_lr: float  # base/initial learning rate
      ... # fields to a specific changing method
    }

The common fields include `type` and `base_lr`. SINGA provides the following
`ChangeMethod`s.

#### kFixed

The `base_lr` is used for all steps.

#### kLinear

The updater should be configured like

    learning_rate {
      base_lr:  float
      linear_conf {
        freq: int
        final_lr: float
      }
    }

Linear interpolation is used to change the learning rate,

    lr = (1 - step / freq) * base_lr + (step / freq) * final_lr

#### kExponential

The udapter should be configured like

    learning_rate {
      base_lr: float
      exponential_conf {
        freq: int
      }
    }

The learning rate for `step` is

    lr = base_lr / 2^(step / freq)

#### kInverseT

The updater should be configured like

    learning_rate {
      base_lr: float
      inverset_conf {
        final_lr: float
      }
    }

The learning rate for `step` is

    lr = base_lr / (1 + step / final_lr)

#### kInverse

The updater should be configured like

    learning_rate {
      base_lr: float
      inverse_conf {
        gamma: float
        pow: float
      }
    }


The learning rate for `step` is

    lr = base_lr * (1 + gamma * setp)^(-pow)


#### kStep

The updater should be configured like

    learning_rate {
      base_lr : float
      step_conf {
        change_freq: int
        gamma: float
      }
    }


The learning rate for `step` is

    lr = base_lr * gamma^ (step / change_freq)

#### kFixedStep

The updater should be configured like

    learning_rate {
      fixedstep_conf {
        step: int
        step_lr: float

        step: int
        step_lr: float

        ...
      }
    }

Denote the i-th tuple as (step[i], step_lr[i]), then the learning rate for
`step` is,

    step_lr[k]

where step[k] is the smallest number that is larger than `step`.


## Advanced user guide

### Implementing a new Updater subclass

The base Updater class has one virtual function,

    class Updater{
     public:
      virtual void Update(int step, Param* param, float grad_scale = 1.0f) = 0;

     protected:
      UpdaterProto proto_;
      LRGenerator lr_gen_;
    };

It updates the values of the `param` based on its gradients. The `step`
argument is for deciding the learning rate which may change through time
(step). `grad_scale` scales the original gradient values. This function is
called by servers once it receives all gradients for the same `Param` object.

To implement a new Updater subclass, users must override the `Update` function.

    class FooUpdater : public Updater {
      void Update(int step, Param* param, float grad_scale = 1.0f) override;
    };

Configuration of this new updater can be declared similar to that of a new
layer,

    # in user.proto
    FooUpdaterProto {
      optional int32 c = 1;
    }

    extend UpdaterProto {
      optional FooUpdaterProto fooupdater_conf= 101;
    }

The new updater should be registered in the
[main function](programming-guide.html)

    driver.RegisterUpdater<FooUpdater>("FooUpdater");

Users can then configure the job as

    # in job.conf
    updater {
      user_type: "FooUpdater"  # must use user_type with the same string identifier as the one used for registration
      fooupdater_conf {
        c : 20;
      }
    }

### Implementing a new LRGenerator subclass

The base `LRGenerator` is declared as,

    virtual float Get(int step);

To implement a subclass, e.g., `FooLRGen`, users should declare it like

    class FooLRGen : public LRGenerator {
     public:
      float Get(int step) override;
    };

Configuration of `FooLRGen` can be defined using a protocol message,

    # in user.proto
    message FooLRProto {
     ...
    }

    extend LRGenProto {
      optional FooLRProto foolr_conf = 101;
    }

The configuration is then like,

    learning_rate {
      user_type : "FooLR" # must use user_type with the same string identifier as the one used for registration
      base_lr: float
      foolr_conf {
        ...
      }
    }

Users have to register this subclass in the main function,

      driver.RegisterLRGenerator<FooLRGen, std::string>("FooLR")