Apache SINGA
A distributed deep learning platform .
|
Data shard stores training/validation/test tuples. More...
#include <data_shard.h>
Public Types | |
enum | { kRead =0, kCreate =1, kAppend =2 } |
Public Member Functions | |
DataShard (std::string folder, char mode, int capacity=104857600) | |
Init the shard obj. More... | |
bool | Next (std::string *key, Message *val) |
read next tuple from the shard. More... | |
bool | Next (std::string *key, std::string *val) |
read next tuple from the shard. More... | |
bool | Insert (const std::string &key, const Message &tuple) |
Append one tuple to the shard. More... | |
bool | Insert (const std::string &key, const std::string &tuple) |
Append one tuple to the shard. More... | |
void | SeekToFirst () |
Move the read pointer to the head of the shard file. More... | |
void | Flush () |
Flush buffered data to disk. More... | |
const int | Count () |
Iterate through all tuples to get the num of all tuples. More... | |
const std::string | path () |
Protected Member Functions | |
int | Next (std::string *key) |
Read the next key and prepare buffer for reading value. More... | |
int | PrepareForAppend (std::string path) |
Setup the disk pointer to the right position for append in case that the pervious write crashes. More... | |
bool | PrepareNextField (int size) |
Read data from disk if the current data in the buffer is not a full field. More... | |
Data shard stores training/validation/test tuples.
Every worker node should have a training shard (validation/test shard is optional). The shard file for training is singa::Cluster::workspace()/train/shard.dat; The shard file for validation is singa::Cluster::workspace()/train/shard.dat; Similar path for test.
shard.dat consists of a set of unordered tuples. Each tuple is encoded as [key_len key record_len val] (key_len and record_len are of type uint32, which indicate the bytes of key and record respectively.
When Shard obj is created, it will remove the last key if the record size and key size do not match because the last write of tuple crashed.
TODO
anonymous enum |
singa::DataShard::DataShard | ( | std::string | folder, |
char | mode, | ||
int | capacity = 104857600 |
||
) |
Init the shard obj.
shard folder (path excluding shard.dat) on worker node shard open mode, Shard::kRead, Shard::kWrite or Shard::kAppend batch bufsize bytes data for every disk op (read or write), default is 100MB
const int singa::DataShard::Count | ( | ) |
Iterate through all tuples to get the num of all tuples.
void singa::DataShard::Flush | ( | ) |
Flush buffered data to disk.
Used only for kCreate or kAppend.
bool singa::DataShard::Insert | ( | const std::string & | key, |
const Message & | tuple | ||
) |
Append one tuple to the shard.
key | e.g., image path |
val |
bool singa::DataShard::Insert | ( | const std::string & | key, |
const std::string & | tuple | ||
) |
Append one tuple to the shard.
key | e.g., image path |
val |
bool singa::DataShard::Next | ( | std::string * | key, |
Message * | val | ||
) |
read next tuple from the shard.
key
val | record of type Message |
bool singa::DataShard::Next | ( | std::string * | key, |
std::string * | val | ||
) |
read next tuple from the shard.
key tuple key
val | record of type string |
|
protected |
Read the next key and prepare buffer for reading value.
key |
|
inline |
|
protected |
Setup the disk pointer to the right position for append in case that the pervious write crashes.
path | shard path. |
|
protected |
Read data from disk if the current data in the buffer is not a full field.
size | size of the next field. |
void singa::DataShard::SeekToFirst | ( | ) |
Move the read pointer to the head of the shard file.
Used for repeated reading.