
Each Tensor instance is a multi-dimensional array allocated on a specific Device instance. Tensor instances store variables and provide linear algebra operations over different types of hardware devices without user awareness. Note that users need to make sure the tensor operands are allocated on the same device except copy functions.

Tensor implementation

SINGA has three different sets of implmentations of Tensor functions, one for each type of Device.

  • ‘tensor_math_cpp.h’ implements operations using Cpp (with CBLAS) for CppGPU devices.
  • ‘tensor_math_cuda.h’ implements operations using Cuda (with cuBLAS) for CudaGPU devices.
  • ‘tensor_math_opencl.h’ implements operations using OpenCL for OpenclGPU devices.

Python API

Example usage:

import numpy as np
from singa import tensor
from singa import device

# create a tensor with shape (2,3), default CppCPU device and float32
x = tensor.Tensor((2, 3))

# create a tensor from a numpy array
npy = np.zeros((3, 3), dtype=np.float32)
y = tensor.from_numpy(npy)

y.uniform(-1, 1)  # sample values from the uniform distribution

z = tensor.mult(x, y)  # gemm -> z of shape (2, 3)

x += z  # element-wise addition

dev = device.get_default_device()
x.to_device(dev)  # move the data to a gpu device

r = tensor.relu(x)

s = tensor.to_numpy(r)  # tensor -> numpy array

There are two sets of tensor functions,

Tensor member functions
which would change the internal state of the Tensor instance.
Tensor module functions
which accept Tensor instances as arguments and return Tensor instances.

Every Tesor instance must be initialized before reading data from it.

class singa.tensor.AddBias(axis=0)

Add Bias to each row / column of the Tensor, depending on the parameter axis.

Parameters:dy (CTensor) – data for the dL / dy, L is the loss.
Returns:a tuple for (db, dx), db is data for dL / db, dx is data for dL / dx.
forward(x, b)
  • x – matrix.
  • b – bias to be added.

the result Tensor

class singa.tensor.CrossEntropy

Calculte CrossEntropy loss for a batch of training data.

  • dy (float or CTensor) – scalar, accumulate gradient from outside of current network, usually
  • to 1.0 (equal) –

data for the dL /dx, L is the loss, x is the output of current network. note that this is true for dy = 1.0

Return type:

dx (CTensor)

forward(x, t)
  • x (CTensor) – 1d or 2d tensor, the prediction data(output) of current network.
  • t (CTensor) – 1d or 2d tensor, the target data for training.


Return type:

loss (CTensor)

class singa.tensor.Dummy(tensor, name=None)

Dummy operation whice serves as a placehoder for autograd

Parameters:name (string) – set it for debug
class singa.tensor.Matmul

For matrix multiplication

Parameters:dy (CTensor) – data for the dL / dy, L is the loss
Returns:a tuple for (dx, dw)
forward(x, w)

Do forward propgation.

Store the x(or w) if w(or x) requires gradient.

  • x (CTensor) – matrix
  • w (CTensor) – matrix

a CTensor for the result

class singa.tensor.Operation

An operation includes the forward and backward function of tensor calculation.

To add a specific operation Xxxx, subclass Operation and implement forward() and backward(). Then implement a function xxxx which creates a Xxxx instance and calls __call__ to do forward. The autograd engine is able to do backward propagation by calling the backward() of Xxxx automatically. Notice that the tensors are CTensor. NOT Python Tensor. The arguments of forward() and backward() should only include CTensor args;


Backward propagation.

Parameters:dys – input args consisting of only CTensors.
Returns:CTensor instance(s)

Forward propagation.

Parameters:xs – input args consisting of only CTensors.
Returns:CTensor instance(s)
class singa.tensor.ReLU
Parameters:dy (CTensor) – dL / dy
Returns:dL / dx = dy if x >= 0; otherwise 0;
Return type:dx(CTensor)
Parameters:x (CTensor) – input tensor
Returns:a new CTensor whose element y = x if x >= 0; otherwise 0;
class singa.tensor.SoftMax(axis=0)

Apply SoftMax for each row of the Tensor or each column of the Tensor according to the parameter axis.

Parameters:dy (CTensor) – data for the dL / dy, L is the loss
Returns:data for the dL / dx, L is the loss, x is the input of current Opertion
Return type:dx (Ctensor)
Parameters:x (data) – the input 1d or 2d tensor
Returns:the result Tensor
class singa.tensor.Tensor(shape=(), device=None, dtype=0, data=None, requires_grad=True, stores_grad=False, creator=None)

Python Tensor, which wraps a swig converted Tensor from CPP Tensor.

  • shape (tuple<int>) – a tuple of integers for the tensor shape. If shape is not specified, the created tensor is called a dummy tensor.
  • device – a swig device. If None, the default host device is used.
  • dtype – data type. currently, most operations only accept float32.
  • data – a numpy array or swig tensor.
  • requires_grad – boolean indicator for computing the gradient.
  • stores_grad – boolean indicator for storing and returning the gradient. Some intermediate tensors’ gradient can be released during the backward propagation. A tensor may require grad but not store grad; But if a tensor stores grad then it must require grad.

shallow copy, negate the transpose field.

Returns:a new Tensor which shares the underlying data memory (shallow copy) but is marked as a transposed version of this tensor.

Add a tensor to each column of this tensor.

Parameters:v (Tensor) – a Tensor to be added as a column to this tensor.

Add a tensor to each row of this tensor.

Parameters:v (Tensor) – a Tensor to be added as a row to this tensor.

Sample 0/1 for each element according to the given probability.

Parameters:p (float) – with probability p, each element is sample to 1.
Returns:a new Tensor which does deep copy of this tensor

shallow copy calls copy constructor of singa::Tensor


Copy data from other Tensor instance.

Parameters:t (Tensor) – source Tensor.
copy_from_numpy(np_array, offset=0)

Copy the data from the numpy array.

  • np_array – source numpy array
  • offset (int) – destination offset

Same as clone().

Returns:a new Tensor

Divide each column of this tensor by v.

Parameters:v (Tensor) – 1d tensor of the same length the column of self.

Divide each row of this tensor by v.

Parameters:v (Tensor) – 1d tensor of the same length the row of self.
gaussian(mean, std)

Generate a value for each element following a Gaussian distribution.

  • mean (float) – mean of the distribution
  • std (float) – standard variance of the distribution
Returns:True if the tensor is empty according to its shape
Returns:True if the internal data is transposed; otherwise False.
Returns:the L1 norm.
Returns:the L2 norm.
Returns:the number of Bytes allocated for this tensor.

Multiply each column of this tensor by v element-wisely.

Parameters:v (Tensor) – 1d tensor of the same length the column of self.

Multiply each row of this tensor by v element-wisely.

Parameters:v (Tensor) – 1d tensor of the same length the row of self.
Returns:the number of dimensions of the tensor.

Reset the shape, dtype and device as the given tensor.

Parameters:t (Tensor) –

Change the tensor shape.

Parameters:shape (list<int>) – new shape, which should have the same volumn as the original shape.

Set all elements of the tensor to be the give value.

Parameters:x (float) –
Returns:the number of elements of the tensor.

Move the tensor data onto a given device.

Parameters:device – a swig Device converted from CudaGPU or CppCPU or OpenclGPU

Move the tensor data onto the default host CppCPU device.

uniform(low, high)

Generate a value for each element following a uniform distribution.

  • low (float) – the lower bound
  • high (float) – the hight bound
Parameters:t (Tensor) – input Tensor
Returns:a new Tensor whose element y = abs(x), x is an element of t
singa.tensor.add(lhs, rhs, ret=None)

Elementi-wise addition.

  • lhs (Tensor) –
  • rhs (Tensor) –
  • ret (Tensor, optional) – if not None, the result is stored in it; otherwise, a new Tensor would be created for the result.

the result Tensor

singa.tensor.add_column(alpha, v, beta, M)

Add v to each column of M.

Denote each column of M as m, m = alpha * v + beta * m

  • alpha (float) –
  • v (Tensor) –
  • beta (float) –
  • M (Tensor) – 2d tensor


singa.tensor.add_row(alpha, v, beta, M)

Add v to each row of M.

Denote each row of M as m, m = alpha * v + beta * m

  • alpha (float) –
  • v (Tensor) –
  • beta (float) –
  • M (Tensor) – 2d tensor


singa.tensor.average(t, axis=None)
  • t (Tensor) – input Tensor
  • axis (int, optional) – if None, average all elements; otherwise average along the given dimension. 0 for averaging each column; 1 for averaging each row.

a float value if axis is None; otherwise, a new Tensor for the result.

singa.tensor.axpy(alpha, x, y)

Element-wise operation for y += alpha * x.



singa.tensor.bernoulli(p, t)

Generate a binary value for each element of t.

  • p (float) – each element is 1 with probability p; and 0 with 1 - p
  • t (Tensor) – the results are put into t


singa.tensor.copy_data_to_from(dst, src, size, dst_offset=0, src_offset=0)

Copy the data between two Tensor instances which could be on different devices.

  • dst (Tensor) – destination Tensor
  • src (Tensor) – source Tensor
  • size (int) – number of elements to copy
  • dst_offset (int) – offset in terms of elements to the start of dst
  • src_offset (int) – offset in terms of elements to the start of src
singa.tensor.copy_from_numpy(data, np_array)

Copy the data from the numpy array.


To be used in SoftMax Operation. Convert a singa_tensor to numpy_tensor.

singa.tensor.div(lhs, rhs, ret=None)

Elementi-wise division.

  • lhs (Tensor) –
  • rhs (Tensor) –
  • ret (Tensor, optional) – if not None, the result is stored in it; otherwise, a new Tensor would be created for the result.

the result Tensor

singa.tensor.einsum(ops, *args)

function TODO list to finish the function in cpp(just like numpy function): 1.sum(A,axis = None) 2.repeat(A,repeats) 3.transpose(A,axes = None) Do the matrix to matrix einsum calculation according to the operands Warning : this function could only support two matrix’ einsum calcultion :param ops: the string specifies the subscripts for summation such as ‘ki,kj->kij’

Here all the 26 lowercase letter can be used here.
Parameters:arg (list of array_like) – These are the tensors for the operation,but here only support two tensors.
Returns: Singa.Tensor
the output matirx of the einsum calculation

The best way to understand this function is to try the examples below: A_ = [0,1,2,3,4,5,6,7,8,9,10,11] A = A_.reshape(4,3) B = A_.reshape(3,4)

Here this einsum calculation is the same as normal ‘mult’ Res = einsum(‘ij,jk->ik’,A,B)

>>> [[ 20  23  26  29]
     [ 56  68  80  92]
     [ 92 113 134 155]
     [128 158 188 218]]

A_ = [0,1,2,3,4,5,6,7,8,9,10,11] A = A_.reshape(4,3) B = A_.reshape(4,3)

Here the einsum calculation is the same as normol ‘eltwise_mult’ Res = einsum(‘ki,ki->ki’,A,B)

>>> [[  0   1   4]
     [  9  16  25]
     [ 36  49  64]
     [ 81 100 121]]

A = [0,1,2,3,4,5,6,7,8,9,10,11] A = A.reshape(4,3)

Res = einsum(‘ki,kj->kij’,A,A) >>> [[[ 0 0 0]

[ 0 1 2] [ 0 2 4]]
[[ 9 12 15]
[ 12 16 20] [ 15 20 25]]
[[ 36 42 48]
[ 42 49 56] [ 48 56 64]]
[[ 81 90 99]
[ 90 100 110] [ 99 110 121]]]

A_ = [0,1,2,3,4,5,6,7,8,9,10,11] A = A_.reshape(3,2,2)

Res = einsum(‘kia,kja->kij’,A,A) >>> [[[ 1 3]

[ 3 13]]
[[ 41 59]
[ 59 85]]
[[145 179]
[179 221]]]
singa.tensor.eltwise_mult(lhs, rhs, ret=None)

Elementi-wise multiplication.

  • lhs (Tensor) –
  • rhs (Tensor) –
  • ret (Tensor, optional) – if not None, the result is stored in it; otherwise, a new Tensor would be created for the result.

the result Tensor

Parameters:t (Tensor) – input Tensor
Returns:a new Tensor whose element y = exp(x), x is an element of t

Create a Tensor instance with the shape, dtype and values from the numpy array.

Parameters:np_array – the numpy array.
Returns:A Tensor instance allocated on the default CppCPU device.
singa.tensor.gaussian(mean, std, t)

Generate values following a Gaussian distribution.

  • mean (float) – the mean of the Gaussian distribution.
  • std (float) – the standard variance of the Gaussian distribution.
  • t (Tensor) – the results are put into t


singa.tensor.ge(t, x)

Elementi-wise comparison for t >= x.

  • t (Tensor) – left hand side operand
  • x (Tensor or float) – right hand side operand

0.0f, or t[i] >= x[i] ? 1.0f:0.0f

Return type:

a Tensor with each element being t[i] >= x ? 1.0f

singa.tensor.gt(t, x)

Elementi-wise comparison for t > x.

  • t (Tensor) – left hand side operand
  • x (Tensor or float) – right hand side operand

0.0f, or t[i] > x[i] ? 1.0f:0.0f

Return type:

a Tensor with each element being t[i] > x ? 1.0f

singa.tensor.le(t, x)

Elementi-wise comparison for t <= x.

  • t (Tensor) – left hand side operand
  • x (Tensor or float) – right hand side operand

0.0f, or t[i] <= x[i] ? 1.0f:0.0f

Return type:

a Tensor with each element being t[i] <= x ? 1.0f

Parameters:t (Tensor) – input Tensor
Returns:a new Tensor whose element y = log(x), x is an element of t
singa.tensor.lt(t, x)

Elementi-wise comparison for t < x

  • t (Tensor) – left hand side operand
  • x (Tensor or float) – right hand side operand

0.0f, or t[i] < x[i] ? 1.0f:0.0f

Return type:

a Tensor with each element being t[i] < x ? 1.0f

singa.tensor.mult(A, B, C=None, alpha=1.0, beta=0.0)

Do matrix-matrix or matrix-vector multiplication.

This function returns C = alpha * A * B + beta * C

  • A (Tensor) – 2d Tensor
  • B (Tensor) – If B is a 1d Tensor, GEMV would be invoked for matrix-vector multiplication; otherwise GEMM would be invoked.
  • C (Tensor, optional) – for storing the result; If None, a new Tensor would be created.
  • alpha (float) –
  • beta (float) –

the result Tensor

singa.tensor.pow(t, x, out=None)
  • t (Tensor) – input tensor
  • x (float or Tensor) – y[i] = t[i]^x if x is a float value; otherwise, y[i]= t[i]^x[i] if x is a tensor.
  • out (None or Tensor) – if None, a new Tensor would be constructed to store the result; otherwise, the result is put into out.

the result tensor.

singa.tensor.reshape(t, s)

Reshape the input tensor with the given shape.

  • t (Tensor) – the tensor to be changed
  • s (list<int>) – the new shape, which should have the same volumn as the old shape.

the new Tensor

Parameters:t (Tensor) – input Tensor
Returns:a new Tensor whose element y = sigmoid(x); x is an element of t
Parameters:t (Tensor) – input Tensor
Returns:a new Tensor whose element y = sign(x)
Returns:the number of bytes of the given SINGA data type defined in core.proto
singa.tensor.softmax(t, out=None)

Apply SoftMax for each row of the Tensor. :param t: the input 1d or 2d tensor :type t: Tensor :param out: if not None, it is used to store the result :type out: Tensor, optional

Returns:the result Tensor
Parameters:t (Tensor) – input Tensor
Returns:a new Tensor whose element y = sqrt(x), x is an element of t
Parameters:t (Tensor) – input Tensor
Returns:a new Tensor whose element y = x * x, x is an element of t
singa.tensor.sub(lhs, rhs, ret=None)

Elementi-wise subtraction.

  • lhs (Tensor) –
  • rhs (Tensor) –
  • ret (Tensor, optional) – if not None, the result is stored in it; otherwise, a new Tensor would be created for the result.

the result Tensor

singa.tensor.sum(t, axis=None)

Sum elements of the input tensor long the given axis.

  • t (Tensor) – input Tensor
  • axis (int, optional) – if None, the summation is done over all elements; if axis is provided, then it is calculated along the given axis, e.g. 0 – sum each column; 1 – sum each row.

a float value as the sum of all elements, or a new Tensor


Sum all columns into a single column.

Parameters:M (Tensor) – the input 2d tensor.
Returns:a new Tensor as the resulted column.

Sum all rows into a single row.

Parameters:M (Tensor) – the input 2d tensor.
Returns:a new Tensor as the resulted row.
Parameters:t (Tensor) – input Tensor
Returns:a new Tensor whose element y = tanh(x), x is an element of t

Copy the data to a host tensor.


Copy the tensor into a numpy array.

Parameters:t (Tensor) –
Returns:a numpy array
singa.tensor.uniform(low, high, t)

Generate values following a Uniform distribution.

  • low (float) – the lower bound
  • hight (float) – the higher bound
  • t (Tensor) – the results are put into t
