Data

This module includes classes for loading and prefetching data batches.

Example usage:

import image_tool
from PIL import Image

tool = image_tool.ImageTool()

def image_transform(img_path):
    global tool
    return tool.load(img_path).resize_by_range(
        (112, 128)).random_crop(
        (96, 96)).flip().get()

data = ImageBatchIter('train.txt', 3,
                      image_transform, shuffle=True, delimiter=',',
                      image_folder='images/',
                      capacity=10)
data.start()
# imgs is a numpy array for a batch of images,
# shape: batch_size, 3 (RGB), height, width
imgs, labels = data.next()

# convert numpy array back into images
for idx in range(imgs.shape[0]):
    img = Image.fromarray(imgs[idx].astype(np.uint8).transpose(1, 2, 0),
                          'RGB')
    img.save('img%d.png' % idx)
data.end()
class singa.data.ImageBatchIter(img_list_file, batch_size, image_transform, shuffle=True, delimiter=' ', image_folder=None, capacity=10)

Utility for iterating over an image dataset to get mini-batches.

Parameters
  • img_list_file (str) – name of the file containing image meta data; each line consists of image_path_suffix delimiter meta_info, where meta info could be label index or label strings, etc. meta_info should not contain the delimiter. If the meta_info of each image is just the label index, then we will parse the label index into a numpy array with length=batchsize (for compatibility); otherwise, we return a list of meta_info; if meta info is available, we return a list of None.

  • batch_size (int) – num of samples in one mini-batch

  • image_transform – a function for image augmentation; it accepts the full image path and outputs a list of augmented images.

  • shuffle (boolean) – True for shuffling images in the list

  • delimiter (char) – delimiter between image_path_suffix and label, e.g., space or comma

  • image_folder (boolean) – prefix of the image path

  • capacity (int) – the max num of mini-batches in the internal queue.