# 解决matlab - Is there overhead of accessing elements in a sparse matrix

itPublisher 分享于

sparse函数 功能：Create sparse matrix-创建稀疏矩阵 用法1：S=sparse(X)——将矩阵X转化为稀疏矩阵的形式，即矩阵X中任何零元素去除，非零元素及其下标（索引

Let's say I have a sparse matrix A. I want to do heavy calculations on it. The calculations do not modify A, only accessing its elements e.g. take a row of A then multiply with something. I wonder whether I should convert A to a full matrix before doing any calculation, or just do it directly?
In other words, is accessing elements in a sparse matrix slower than a full matrix?

|
this question
edited Oct 14 '14 at 11:35 Stewie Griffin 12.4k 8 22 55 asked Oct 14 '14 at 9:50 Tu Bui 443 1 6 18      If you have enough memory to convert `A` to `full`, why do you need `sparse` at all? –  Luis Mendo Oct 14 '14 at 9:51 1   slicing sparse matrices across columns is much faster than by rows, which is understood given how the data is stored internally –  Amro Oct 14 '14 at 9:58      @Luis: The server which i am working on has enough memory for it, but other users are using it as well so I think it would be nicer to use as less resource as possible. Also I did save that matrix (in sparse mode) into a mat file (it does save me some space). Now I load it back and wonder should I convert it to a full matrix. –  Tu Bui Oct 14 '14 at 9:58      @Amro +1: this is interesting. I just found the similar info in this link: mathengineering.com/understanding-sparse-matrix-storage –  Tu Bui Oct 14 '14 at 10:05 1   You might find this question and this answer, including the linked reference interesting. –  Stewie Griffin Oct 14 '14 at 11:28

|

Slicing sparse matrices across columns is much faster than slicing across rows in MATLAB. So you should prefer accessing `M(:,i)` over `M(i,:)`.

Internally MATLAB uses the Compressed Sparse Column (CSC) storage:

• the non-zero elements are stored in a 1-D double array of length `nzmax` arranged in column-major order (`pr` for real part and `pi` for imaginary part if matrix is complex)
• `ir` an array of integers with corresponding row indices
• `jc` an integer array of length `n+1` (where `n` is the number of columns) containing column index information. By definition, the last value of `jc` contains `nnz` (number of non-zeros stored).

As an example, take the following sparse matrix:

``````>> M = sparse([1 3 5 3 4 1 5], [1 1 1 2 2 4 4], [1 7 5 3 4 2 6], 5, 4);
``````

This is what it looks like stored in memory (I'm using 0-based indices for `ir` and `jc`):

``````1 . . 2
. . . .
7 3 . .
. 4 . .
5 . . 6

pr = 1 7 5 3 4 2 6
ir = 0 2 4 2 3 0 4
jc = 0 3 5 5 7

nzmax = at least 7
nnz = 7
m = 5
n = 4
``````

To retrieve the i-th column of the sparse matrix `M(:,i)`, it suffices to do: `pr(jc(i):jc(i+1)-1)` (to keep it simple, I'm not paying attention to 0- vs 1-based indexing). On the other hand, accessing a matrix row involves more calculations and array traversals (it is no longer spatial-locality friendly).

It's worth checking out the original paper by John R. Gilbert, Cleve Moler, and Robert Schreiber: "Sparse Matrices In Matlab: Design and Implementation", (SIAM Journal on Matrix Analysis and Applications , 13:1, 333–356 (1992)).

The computational complexity of simple array operations should be proportional to `nnz`, and perhaps also depend linearly on `m` or `n`, but be independent of the product `m*n`. The complexity of more complicated operations involves such factors as ordering and fill-in, but an objective of a good sparse matrix algorithm should be:

note: this passage serves for the analysis of Alec Jacobson’s thesis 1.what’s stiffness matrix According to (2.16), stiffness matrix is given by: Li

The time required for a sparse matrix operation should be proportional to number of arithmetic operations on nonzero quantities.

We call this the "time is proportional to flops" rule; it is a fundamental tenet of our design.

and

This (column-oriented sparse matrix) scheme is not efficient for manipulating matrices on element at a time: access to a single element takes time at least proportional to the logarithm of the length of its column; inserting or removing a nonzero may require extensive data movement. However, element-by-element manipulation is rare in MATLAB (and is expensive even in full MATLAB). Its most common application would be to create a sparse matrix, but this is more efficiently done by building a list `[i,j,s]` of matrix elements in arbitrary order and then using `sparse(i,j,s)` to create the matrix.

The sparse data structure is allowed to have unused elements after the end of the last column of the matrix. Thus an algorithm that builds up a matrix one column at a time can be implemented efficiently by allocating enough space for all the expected nonzeros at the outset.

Also section 3.1.4 "asymptotic complexity analysis" should be of interest (it's too long to quote here).

|
edited Oct 15 '14 at 7:45 answered Oct 14 '14 at 10:16 Amro 105k 19 171 314      although it does not answer my question (my question is to compare sparse matrix vs full matrix in term of retrieving elements), I am happy with this one. –  Tu Bui Oct 14 '14 at 10:56      To follow on from Amro's answer - if it really is rows you want from the sparse matrix - transpose it and extract columns from the transposed form. Or better still - use the profiler to see what's actually going on! –  Edric Oct 14 '14 at 22:02

|

×
• 登录
• 注册

×