improve both numerical accuracy and speed by using optimized loops in reversed row order (i.e. from higher orders to lower orders) directly on matrix data.