singa-incubating-0.1.0 Release Notes¶
SINGA is a general distributed deep learning platform for training big deep learning models over large datasets. It is designed with an intuitive programming model based on the layer abstraction. SINGA supports a wide variety of popular deep learning models.
This release includes following features:
Job management
SINGA-3 Use Zookeeper to check stopping (finish) time of the system
SINGA-16 Runtime Process id Management
SINGA-25 Setup glog output path
SINGA-26 Run distributed training in a single command
SINGA-30 Enhance easy-to-use feature and support concurrent jobs
SINGA-33 Automatically launch a number of processes in the cluster
SINGA-34 Support external zookeeper service
SINGA-38 Support concurrent jobs
SINGA-39 Avoid ssh in scripts for single node environment
SINGA-43 Remove Job-related output from workspace
SINGA-56 No automatic launching of zookeeper service
SINGA-73 Refine the selection of available hosts from host list
Installation with GNU Auto tool
SINGA-4 Refine thirdparty-dependency installation
SINGA-13 Separate intermediate files of compilation from source files
SINGA-17 Add root permission within thirdparty/install.
SINGA-27 Generate python modules for proto objects
SINGA-53 Add lmdb compiling options
SINGA-62 Remove building scrips and auxiliary files
SINGA-67 Add singatest into build targets
Distributed training
SINGA-7 Implement shared memory Hogwild algorithm
SINGA-8 Implement distributed Hogwild
SINGA-19 Slice large Param objects for load-balance
SINGA-29 Update NeuralNet class to enable layer partition type customization
SINGA-24 Implement Downpour training framework
SINGA-32 Implement AllReduce training framework
SINGA-57 Improve Distributed Hogwild
Training algorithms for different model categories
Checkpoint and restore
SINGA-12 Support Checkpoint and Restore
Unit test
SINGA-64 Add the test module for utils/common
Programming model
SINGA-36 Refactor job configuration, driver program and scripts
SINGA-37 Enable users to set parameter sharing in model configuration
SINGA-54 Refactor job configuration to move fields in ModelProto out
SINGA-55 Refactor main.cc and singa.h
SINGA-61 Support user defined classes
SINGA-65 Add an example of writing user-defined layers
Other features
Some bugs are fixed during the development of this release
SINGA-2 Check failed: zsock_connect
SINGA-5 Server early terminate when zookeeper singa folder is not initially empty
SINGA-15 Fixg a bug from ConnectStub function which gets stuck for connecting layer_dealer_
SINGA-22 Cannot find openblas library when it is installed in default path
SINGA-23 Libtool version mismatch error.
SINGA-28 Fix a bug from topology sort of Graph
SINGA-42 Issue when loading checkpoints
SINGA-44 A bug when reseting metric values
SINGA-46 Fix a bug in updater.cc to scale the gradients
SINGA-47 Fix a bug in data layers that leads to out-of-memory when group size is too large
SINGA-48 Fix a bug in trainer.cc that assigns the same NeuralNet instance to workers from diff groups
SINGA-49 Fix a bug in HandlePutMsg func that sets param fields to invalid values
SINGA-66 Fix bugs in Worker::RunOneBatch function and ClusterProto
SINGA-79 Fix bug in singatool that can not parse -conf flag
Features planned for the next release