singa-incubating-0.1.0 Release Notes
SINGA is a general distributed deep learning platform for training big deep learning models over large datasets. It is designed with an intuitive programming model based on the layer abstraction. SINGA supports a wide variety of popular deep learning models.
This release includes following features:
- Job management
- SINGA-3 Use Zookeeper to check stopping (finish) time of the system
- SINGA-16 Runtime Process id Management
- SINGA-25 Setup glog output path
- SINGA-26 Run distributed training in a single command
- SINGA-30 Enhance easy-to-use feature and support concurrent jobs
- SINGA-33 Automatically launch a number of processes in the cluster
- SINGA-34 Support external zookeeper service
- SINGA-38 Support concurrent jobs
- SINGA-39 Avoid ssh in scripts for single node environment
- SINGA-43 Remove Job-related output from workspace
- SINGA-56 No automatic launching of zookeeper service
- SINGA-73 Refine the selection of available hosts from host list
- Installation with GNU Auto tool
- SINGA-4 Refine thirdparty-dependency installation
- SINGA-13 Separate intermediate files of compilation from source files
- SINGA-17 Add root permission within thirdparty/install.
- SINGA-27 Generate python modules for proto objects
- SINGA-53 Add lmdb compiling options
- SINGA-62 Remove building scrips and auxiliary files
- SINGA-67 Add singatest into build targets
- Distributed training
- SINGA-7 Implement shared memory Hogwild algorithm
- SINGA-8 Implement distributed Hogwild
- SINGA-19 Slice large Param objects for load-balance
- SINGA-29 Update NeuralNet class to enable layer partition type customization
- SINGA-24 Implement Downpour training framework
- SINGA-32 Implement AllReduce training framework
- SINGA-57 Improve Distributed Hogwild
- Training algorithms for different model categories
- SINGA-9 Add Support for Restricted Boltzman Machine (RBM) model
- SINGA-10 Add Support for Recurrent Neural Networks (RNN)
- Unit test
- SINGA-64 Add the test module for utils/common
- Programming model
- SINGA-36 Refactor job configuration, driver program and scripts
- SINGA-37 Enable users to set parameter sharing in model configuration
- SINGA-54 Refactor job configuration to move fields in ModelProto out
- SINGA-55 Refactor main.cc and singa.h
- SINGA-61 Support user defined classes
- SINGA-65 Add an example of writing user-defined layers
- Other features
- SINGA-6 Implement thread-safe singleton
- SINGA-18 Update API for displaying performance metric
- SINGA-77 Integrate with Apache RAT
Some bugs are fixed during the development of this release
- SINGA-2 Check failed: zsock_connect
- SINGA-5 Server early terminate when zookeeper singa folder is not initially empty
- SINGA-15 Fixg a bug from ConnectStub function which gets stuck for connecting layerdealer
- SINGA-22 Cannot find openblas library when it is installed in default path
- SINGA-23 Libtool version mismatch error.
- SINGA-28 Fix a bug from topology sort of Graph
- SINGA-42 Issue when loading checkpoints
- SINGA-44 A bug when reseting metric values
- SINGA-46 Fix a bug in updater.cc to scale the gradients
- SINGA-47 Fix a bug in data layers that leads to out-of-memory when group size is too large
- SINGA-48 Fix a bug in trainer.cc that assigns the same NeuralNet instance to workers from diff groups
- SINGA-49 Fix a bug in HandlePutMsg func that sets param fields to invalid values
- SINGA-66 Fix bugs in Worker::RunOneBatch function and ClusterProto
- SINGA-79 Fix bug in singatool that can not parse -conf flag
Features planned for the next release
- SINGA-11 Start SINGA using Mesos
- SINGA-31 Extend Blob to support xpu (cpu or gpu)
- SINGA-35 Add random number generators
- SINGA-40 Support sparse Param update
- SINGA-41 Support single node single GPU training