singa-incubating-0.1.0 Release Notes


SINGA is a general distributed deep learning platform for training big deep learning models over large datasets. It is designed with an intuitive programming model based on the layer abstraction. SINGA supports a wide variety of popular deep learning models.

This release includes following features:

  • Job management

    • SINGA-3 Use Zookeeper to check stopping (finish) time of the system

    • SINGA-16 Runtime Process id Management

    • SINGA-25 Setup glog output path

    • SINGA-26 Run distributed training in a single command

    • SINGA-30 Enhance easy-to-use feature and support concurrent jobs

    • SINGA-33 Automatically launch a number of processes in the cluster

    • SINGA-34 Support external zookeeper service

    • SINGA-38 Support concurrent jobs

    • SINGA-39 Avoid ssh in scripts for single node environment

    • SINGA-43 Remove Job-related output from workspace

    • SINGA-56 No automatic launching of zookeeper service

    • SINGA-73 Refine the selection of available hosts from host list

  • Installation with GNU Auto tool

    • SINGA-4 Refine thirdparty-dependency installation

    • SINGA-13 Separate intermediate files of compilation from source files

    • SINGA-17 Add root permission within thirdparty/install.

    • SINGA-27 Generate python modules for proto objects

    • SINGA-53 Add lmdb compiling options

    • SINGA-62 Remove building scrips and auxiliary files

    • SINGA-67 Add singatest into build targets

  • Distributed training

    • SINGA-7 Implement shared memory Hogwild algorithm

    • SINGA-8 Implement distributed Hogwild

    • SINGA-19 Slice large Param objects for load-balance

    • SINGA-29 Update NeuralNet class to enable layer partition type customization

    • SINGA-24 Implement Downpour training framework

    • SINGA-32 Implement AllReduce training framework

    • SINGA-57 Improve Distributed Hogwild

  • Training algorithms for different model categories

    • SINGA-9 Add Support for Restricted Boltzman Machine (RBM) model

    • SINGA-10 Add Support for Recurrent Neural Networks (RNN)

  • Checkpoint and restore

    • SINGA-12 Support Checkpoint and Restore

  • Unit test

    • SINGA-64 Add the test module for utils/common

  • Programming model

    • SINGA-36 Refactor job configuration, driver program and scripts

    • SINGA-37 Enable users to set parameter sharing in model configuration

    • SINGA-54 Refactor job configuration to move fields in ModelProto out

    • SINGA-55 Refactor main.cc and singa.h

    • SINGA-61 Support user defined classes

    • SINGA-65 Add an example of writing user-defined layers

  • Other features

    • SINGA-6 Implement thread-safe singleton

    • SINGA-18 Update API for displaying performance metric

    • SINGA-77 Integrate with Apache RAT

Some bugs are fixed during the development of this release

  • SINGA-2 Check failed: zsock_connect

  • SINGA-5 Server early terminate when zookeeper singa folder is not initially empty

  • SINGA-15 Fixg a bug from ConnectStub function which gets stuck for connecting layer_dealer_

  • SINGA-22 Cannot find openblas library when it is installed in default path

  • SINGA-23 Libtool version mismatch error.

  • SINGA-28 Fix a bug from topology sort of Graph

  • SINGA-42 Issue when loading checkpoints

  • SINGA-44 A bug when reseting metric values

  • SINGA-46 Fix a bug in updater.cc to scale the gradients

  • SINGA-47 Fix a bug in data layers that leads to out-of-memory when group size is too large

  • SINGA-48 Fix a bug in trainer.cc that assigns the same NeuralNet instance to workers from diff groups

  • SINGA-49 Fix a bug in HandlePutMsg func that sets param fields to invalid values

  • SINGA-66 Fix bugs in Worker::RunOneBatch function and ClusterProto

  • SINGA-79 Fix bug in singatool that can not parse -conf flag

Features planned for the next release

  • SINGA-11 Start SINGA using Mesos

  • SINGA-31 Extend Blob to support xpu (cpu or gpu)

  • SINGA-35 Add random number generators

  • SINGA-40 Support sparse Param update

  • SINGA-41 Support single node single GPU training