Updater

Every server in SINGA has an Updater instance that updates parameters based on gradients. In this page, the Basic user guide describes the configuration of an updater. The Advanced user guide present details on how to implement a new updater and a new learning rate changing method.

Basic user guide

There are many different parameter updating protocols (i.e., subclasses of Updater). They share some configuration fields like

type, an integer for identifying an updater;
learning_rate, configuration for the LRGenerator which controls the learning rate.
weight_decay, the co-efficient for L2 * regularization.
momentum.

If you are not familiar with the above terms, you can get their meanings in this page provided by Karpathy.

Configuration of built-in updater classes

Updater

The base Updater implements the vanilla SGD algorithm. Its configuration type is kSGD. Users need to configure at least the learning_rate field. momentum and weight_decay are optional fields.

updater{
  type: kSGD
  momentum: float
  weight_decay: float
  learning_rate {
    ...
  }
}

AdaGradUpdater

It inherits the base Updater to implement the AdaGrad algorithm. Its type is kAdaGrad. AdaGradUpdater is configured similar to Updater except that momentum is not used.

NesterovUpdater

It inherits the base Updater to implements the Nesterov (section 3.5) updating protocol. Its type is kNesterov. learning_rate and momentum must be configured. weight_decay is an optional configuration field.

RMSPropUpdater

It inherits the base Updater to implements the RMSProp algorithm proposed by Hinton(slide 29). Its type is kRMSProp.

updater {
  type: kRMSProp
  rmsprop_conf {
   rho: float # [0,1]
  }
}

Configuration of learning rate

The learning_rate field is configured as,

learning_rate {
  type: ChangeMethod
  base_lr: float  # base/initial learning rate
  ... # fields to a specific changing method
}

The common fields include type and base_lr. SINGA provides the following ChangeMethods.

kFixed

The base_lr is used for all steps.

kLinear

The updater should be configured like

learning_rate {
  base_lr:  float
  linear_conf {
    freq: int
    final_lr: float
  }
}

Linear interpolation is used to change the learning rate,

lr = (1 - step / freq) * base_lr + (step / freq) * final_lr

kExponential

The udapter should be configured like

learning_rate {
  base_lr: float
  exponential_conf {
    freq: int
  }
}

The learning rate for step is

lr = base_lr / 2^(step / freq)

kInverseT

The updater should be configured like

learning_rate {
  base_lr: float
  inverset_conf {
    final_lr: float
  }
}

The learning rate for step is

lr = base_lr / (1 + step / final_lr)

kInverse

The updater should be configured like

learning_rate {
  base_lr: float
  inverse_conf {
    gamma: float
    pow: float
  }
}

The learning rate for step is

lr = base_lr * (1 + gamma * setp)^(-pow)

kStep

The updater should be configured like

learning_rate {
  base_lr : float
  step_conf {
    change_freq: int
    gamma: float
  }
}

The learning rate for step is

lr = base_lr * gamma^ (step / change_freq)

kFixedStep

The updater should be configured like

learning_rate {
  fixedstep_conf {
    step: int
    step_lr: float

    step: int
    step_lr: float

    ...
  }
}

Denote the i-th tuple as (step[i], step_lr[i]), then the learning rate for step is,

step_lr[k]

where step[k] is the smallest number that is larger than step.

Advanced user guide

Implementing a new Updater subclass

The base Updater class has one virtual function,

class Updater{
 public:
  virtual void Update(int step, Param* param, float grad_scale = 1.0f) = 0;

 protected:
  UpdaterProto proto_;
  LRGenerator lr_gen_;
};

It updates the values of the param based on its gradients. The step argument is for deciding the learning rate which may change through time (step). grad_scale scales the original gradient values. This function is called by servers once it receives all gradients for the same Param object.

To implement a new Updater subclass, users must override the Update function.

class FooUpdater : public Updater {
  void Update(int step, Param* param, float grad_scale = 1.0f) override;
};

Configuration of this new updater can be declared similar to that of a new layer,

# in user.proto
FooUpdaterProto {
  optional int32 c = 1;
}

extend UpdaterProto {
  optional FooUpdaterProto fooupdater_conf= 101;
}

The new updater should be registered in the main function

driver.RegisterUpdater<FooUpdater>("FooUpdater");

Users can then configure the job as

# in job.conf
updater {
  user_type: "FooUpdater"  # must use user_type with the same string identifier as the one used for registration
  fooupdater_conf {
    c : 20;
  }
}

Implementing a new LRGenerator subclass

The base LRGenerator is declared as,

virtual float Get(int step);

To implement a subclass, e.g., FooLRGen, users should declare it like

class FooLRGen : public LRGenerator {
 public:
  float Get(int step) override;
};

Configuration of FooLRGen can be defined using a protocol message,

# in user.proto
message FooLRProto {
 ...
}

extend LRGenProto {
  optional FooLRProto foolr_conf = 101;
}

The configuration is then like,

learning_rate {
  user_type : "FooLR" # must use user_type with the same string identifier as the one used for registration
  base_lr: float
  foolr_conf {
    ...
  }
}

Users have to register this subclass in the main function,

  driver.RegisterLRGenerator<FooLRGen, std::string>("FooLR")