Apache SINGA
A distributed deep learning platform .
 All Classes Namespaces Files Functions Variables Typedefs Enumerator Macros
Public Types | Public Member Functions | Protected Member Functions | List of all members
singa::DataShard Class Reference

Data shard stores training/validation/test tuples. More...

#include <data_shard.h>

Public Types

enum  { kRead =0, kCreate =1, kAppend =2 }
 

Public Member Functions

 DataShard (std::string folder, char mode, int capacity=104857600)
 Init the shard obj. More...
 
bool Next (std::string *key, Message *val)
 read next tuple from the shard. More...
 
bool Next (std::string *key, std::string *val)
 read next tuple from the shard. More...
 
bool Insert (const std::string &key, const Message &tuple)
 Append one tuple to the shard. More...
 
bool Insert (const std::string &key, const std::string &tuple)
 Append one tuple to the shard. More...
 
void SeekToFirst ()
 Move the read pointer to the head of the shard file. More...
 
void Flush ()
 Flush buffered data to disk. More...
 
const int Count ()
 Iterate through all tuples to get the num of all tuples. More...
 
const std::string path ()
 

Protected Member Functions

int Next (std::string *key)
 Read the next key and prepare buffer for reading value. More...
 
int PrepareForAppend (std::string path)
 Setup the disk pointer to the right position for append in case that the pervious write crashes. More...
 
bool PrepareNextField (int size)
 Read data from disk if the current data in the buffer is not a full field. More...
 

Detailed Description

Data shard stores training/validation/test tuples.

Every worker node should have a training shard (validation/test shard is optional). The shard file for training is singa::Cluster::workspace()/train/shard.dat; The shard file for validation is singa::Cluster::workspace()/train/shard.dat; Similar path for test.

shard.dat consists of a set of unordered tuples. Each tuple is encoded as [key_len key record_len val] (key_len and record_len are of type uint32, which indicate the bytes of key and record respectively.

When Shard obj is created, it will remove the last key if the record size and key size do not match because the last write of tuple crashed.

TODO

  1. split one shard into multile shards.
  2. add threading to prefetch and parse records

Member Enumeration Documentation

anonymous enum
Enumerator
kRead 

read only mode used in training

write mode used in creating shard (will overwrite previous one)

kCreate 

append mode, e.g. used when previous creating crashes

Constructor & Destructor Documentation

singa::DataShard::DataShard ( std::string  folder,
char  mode,
int  capacity = 104857600 
)

Init the shard obj.

shard folder (path excluding shard.dat) on worker node shard open mode, Shard::kRead, Shard::kWrite or Shard::kAppend batch bufsize bytes data for every disk op (read or write), default is 100MB

Member Function Documentation

const int singa::DataShard::Count ( )

Iterate through all tuples to get the num of all tuples.

Returns
num of tuples
void singa::DataShard::Flush ( )

Flush buffered data to disk.

Used only for kCreate or kAppend.

bool singa::DataShard::Insert ( const std::string &  key,
const Message &  tuple 
)

Append one tuple to the shard.

Parameters
keye.g., image path
val
Returns
reture if sucess, otherwise false, e.g., inserted before
bool singa::DataShard::Insert ( const std::string &  key,
const std::string &  tuple 
)

Append one tuple to the shard.

Parameters
keye.g., image path
val
Returns
reture if sucess, otherwise false, e.g., inserted before
bool singa::DataShard::Next ( std::string *  key,
Message *  val 
)

read next tuple from the shard.

key

Parameters
valrecord of type Message
Returns
true if read success otherwise false, e.g., the tuple was not inserted completely.
bool singa::DataShard::Next ( std::string *  key,
std::string *  val 
)

read next tuple from the shard.

key tuple key

Parameters
valrecord of type string
Returns
true if read success otherwise false, e.g., the tuple was not inserted completely.
int singa::DataShard::Next ( std::string *  key)
protected

Read the next key and prepare buffer for reading value.

Parameters
key
Returns
length (i.e., bytes) of value field.
const std::string singa::DataShard::path ( )
inline
Returns
path to shard file
int singa::DataShard::PrepareForAppend ( std::string  path)
protected

Setup the disk pointer to the right position for append in case that the pervious write crashes.

Parameters
pathshard path.
Returns
offset (end pos) of the last success written record.
bool singa::DataShard::PrepareNextField ( int  size)
protected

Read data from disk if the current data in the buffer is not a full field.

Parameters
sizesize of the next field.
void singa::DataShard::SeekToFirst ( )

Move the read pointer to the head of the shard file.

Used for repeated reading.


The documentation for this class was generated from the following file: