slight optimization: cached pointers to row vectors from double arrays wherever the same row was used in a loop, to avoid double index computation/range check