# Extract data from MATLAB matrix without for-loop

up vote 3 down vote favorite 1 In MATLAB, let us say I have a 10 x 100 matrix, called M. What I would like to do, is extract particular indicies of this matrix, and do an operation on them immediately, based on the row index, in a vectorized fashion. For example, for the first row, I want to compute sum(M(1, 1:1:100)). Then for row two, I want sum(M(2, 1:2:100)). For row three, I want sum(M(3, 1:3:100)), etc etc. For row ten, I have of course sum(M(10, 1:10:100)). I have this in a for loop, but I want to see if there is a way to extract this data without a for loop. Thank you. matlab matrix vectorization
|
this question asked Jun 12 '13 at 16:03 Spacey 1,213 4 31 51 1 Why do you want to avoid a loop? Are you experiencing performance issues? –  Eitan T Jun 12 '13 at 16:08 @EitanT Yes, guess its not THAT big of a deal, but there might be potential performance issues down the line, so I am trying to see if there is a more elegant way. –  Spacey Jun 12 '13 at 16:10 1 Pro tip: don't optimize code before benchmarking. Leave it as is, unless you're certain that this is the bottleneck. –  Eitan T Jun 12 '13 at 16:12 As D. Knuth said: "...: premature optimization is the root of all evil." –  Oleg Jun 12 '13 at 16:15 @OlegKomarov Ha! Thanks for the link ... I will just say that I am curious of how this can be done either way, for my own learning. I am a learnaholic after all... ;-) –  Spacey Jun 12 '13 at 16:16  |  show 7 more comments 3 Answers

up vote 3 down vote ---Accepted---Accepted---Accepted---

You can try this for a one-liner S=arrayfun(@(n) sum(M(n,1:n:100)), 1:10)

Alternatively, you can create a sparse matrix beforehand A=sparse(100,10);

for n=1:10,

A(1:n:100, n)=1;

end

and find the sum by S=diag(M*A);

This can be optimized further for larger matrices by defining A=sparse(10,100) and S=sum(M.*A,2);

my quick benchamrking M=rand(10,100);

sz = size(M);

tic;

for k=1:10000,

for n=1:sz(1),

B(n)=sum(M(n,1:n:end));

end

end

toc

tic;

for k=1:10000,

B=arrayfun(@(n) sum(M(n,1:n:end)), 1:sz(1));

end

toc

tic;

for k=1:10000,

A=sparse(sz(2), sz(1));

for n=1:sz(1),

A(1:n:end, n)=1;

end

B=diag(M*A);

end

toc

tic;

A=sparse(sz(2),sz(1));

for n=1:sz(1),

A(1:n:end, n)=1;

end

for k=1:10000,

B=diag(M*A);

end

toc

tic;

A=sparse(sz(1),sz(2));

for n=1:sz(1),

A(n, 1:n:end)=1;

end

for k=1:10000,

B=sum(M.*A,2);

end

toc

returns Elapsed time is 0.552470 seconds.

Elapsed time is 2.409102 seconds.

Elapsed time is 0.638072 seconds.

Elapsed time is 0.052246 seconds.

Elapsed time is 0.061893 seconds.

for 30-by-1000 matrix Elapsed time is 1.785664 seconds.

Elapsed time is 3.954034 seconds.

Elapsed time is 4.760436 seconds.

Elapsed time is 0.926118 seconds.

Elapsed time is 0.865330 seconds.

and for 1000-by-100 matrix Elapsed time is 51.389322 seconds.

Elapsed time is 63.443414 seconds.

Elapsed time is 68.327187 seconds.

Elapsed time is 29.056304 seconds.

Elapsed time is 1.147215 seconds.

|
this answer edited Jun 12 '13 at 21:23 answered Jun 12 '13 at 16:21 Mohsen Nosratinia 9,242 1 19 46 Arrayfun is a concealed loop and will suffer from overhead in comparison to a plain loop. This solution is not purely vectorized. –  Oleg Jun 12 '13 at 16:31 1 The question asked is "if there is a way to extract this data without a for loop". And this qualifies as an answer to that question. –  Mohsen Nosratinia Jun 12 '13 at 16:36 I do consider arrayfun(), cellfun(), structufun() and all functions that have MATLAB code that execute a loop to be non-vectorized solutions. This is not just my personal opinion, but is widely assumed to be so especially because vectorized often stands for performance. In this perspective, only cellfun('stringInput',c) might be faster than an explicit loop. –  Oleg Jun 12 '13 at 16:43 Mohsen I like your solutions... can you please benchmark the straight up for loop against the last sparse solution you provided? I think that might be the one... thanks so much. –  Spacey Jun 12 '13 at 16:46 Fantastic! Looks like an entire order of magnitude gain with last sparse solution... brilliant! :) –  Spacey Jun 12 '13 at 16:58  |  show 5 more comments up vote 5 down vote Proposed solution I was able to finally crack it for a truly vectorized solution that uses logical indexing to select the elements from the input matrix that are to be summed up. This magic was achieved with bsxfun with its optional functional handle @mod. The code is listed below - [m,n] = size(M);

mask = bsxfun(@mod,1:n,(1:m)')==1; %//'# all of magic happens here as it creates

%// a logical mask of 1's at places in input matrix

%// whose elements are to be summed and 0's elsewhere.

mask(1,:) = 1; %// set the first row as all ones as we need to sum all of those

sumvals = sum(mask.*M,2); %// finally get the sum values

Benchmarking In this benchmarking section I am including the four approaches - The one listed earlier in this post and its GPU ported version, arrayfun and sparse based approaches listed in the other solution. Three sets of input data were used for the benchmarking - Set 1: Number of columns is kept at a multiple factor of 10 with respect to the number of rows in the input matrix as used in the question. Set 2: The number of rows are extended so that number of rows is now 10x number of columns. This would really test out the loopy codes, as the number of iterations would be more in this case. Set 3: This set is an extension of set2, to further increase the number of rows and thus would be another great test between truly vectorized approaches against others. The function codes used for benchmarking are listed next - function sumvals = sumrows_stepped_bsxfun(M)

//... same as the code posted earlier

return

function sumvals = sumrows_stepped_bsxfun_gpu(M)

gM = gpuArray(M);

[m,n] = size(gM);

sumvals(1) = sum(M(1,:));

return

function S = sumrows_stepped_arrayfun(M)

[m,n] = size(M);

S = arrayfun(@(x) sum(M(x,1:x:n)), 1:m);

return

function B = sumrows_stepped_sparse(M)

sz = size(M);

A=sparse(sz(1),sz(2));

for n=1:sz(1),

A(n, 1:n:end)=1;

end

B=full(sum(M.*A,2));

return

Please note that timeit was used for timing CPU based codes and gputimeit for GPU based codes. System Configuration used for testing - MATLAB Version: 8.3.0.532 (R2014a)

Operating System: Windows 7

RAM: 3GB

CPU Model: Intel® Pentium® Processor E5400 (2M Cache, 2.70 GHz)

GPU Model: GTX 750Ti 2GB

The benchmarking results thus obtained - Conclusions For datasizes with number of rows lesser than the number of columns, the number of iterations being a small number, loopy codes seem to have the upper-hand. As we increase the number of rows, the benefit of truly vectorized approaches become clear. You would also notice that bsxfun on CPU based approach does well for set 3, until around 12000 x 300 mark against non vectorized approaches and the reason behind that is, bsxfun creates this huge logical mask and at that point the memory bandwidth requirement becomes too high to cope up with the compute capability of bsxfun. This makes sense as vectorized operations by definition mean performing operations on many elements in one go, so memory bandwidth is essential. So, if you have a better machine with more RAM, that 12000 x 300 mark should extend further. If you could extend the number of rows further, the benefits of a vectorized solution would become more clear as long as the memory bandwidth is kept in check. Benchmarking Code Finally here's the benchmarking code if anyone would like to test it out on their systems - clear all; clc; close all

outputfile = 'results.xlsx';

delete(outputfile); %// remove file, so that new results could be written into

base_datasize_array = 40:60:400;

methods = {'BSXFUN on GPU','BSXFUN on CPU','ARRAYFUN','SPARSE'};

num_approaches = numel(methods);

num_sets = 3;

timeall_all = zeros(num_approaches,numel(base_datasize_array),num_sets);

datasize_lbs = cell(numel(base_datasize_array),num_sets);

for set_id = 1:num_sets

switch set_id

case 1

N1_arr = base_datasize_array*2;

N2_arr = N1_arr*10;

case 2

N2_arr = base_datasize_array*2;

N1_arr = N2_arr*10;

case 3

N2_arr = base_datasize_array;

N1_arr = N2_arr*40;

end

timeall = zeros(num_approaches,numel(N1_arr));

for iter = 1:numel(N1_arr)

M = rand(N1_arr(iter),N2_arr(iter));

f = @() sumrows_stepped_bsxfun_gpu(M);

timeall(1,iter) = gputimeit(f); clear f

f = @() sumrows_stepped_bsxfun(M);

timeall(2,iter) = timeit(f); clear f

f = @() sumrows_stepped_arrayfun(M);

timeall(3,iter) = timeit(f); clear f

f = @() sumrows_stepped_sparse(M);

timeall(4,iter) = timeit(f); clear f

end

timeall_all(:,:,set_id) = timeall;

wp = repmat({' '},numel(N1_arr),1);

datasize_lbs(:,set_id) = strcat(cellstr(num2str(N1_arr.')),' x ',...

wp,cellstr(num2str(N2_arr.')));

end

for set_id=1:num_sets

out_cellarr = cell(numel(methods)+1,numel(N1_arr)+1);

out_cellarr(1,1) = {'Methods'};

out_cellarr(2:end,1) = methods;

out_cellarr(1,2:end) = datasize_lbs(:,set_id);

out_cellarr(2:end,2:end) = cellfun(@(x) num2str(x),...

num2cell(timeall_all(:,:,set_id)),'Uni',0);

xlswrite(outputfile, out_cellarr,set_id);

end

|
this answer edited May 23 '17 at 12:16 Community ♦ 1 1 answered Nov 15 '14 at 18:30 Divakar 140k 14 60 142 1 Kudos! except that the figures are made in Excel, Matlab can produce equally ugly plots :) –  Arpi Nov 15 '14 at 19:40 @Arpi Yeah! Well MATLAB plots don't look that "sleek", so made that switch! ;) Think it's less ugly with excel! –  Divakar Nov 15 '14 at 20:36

|  up vote 1 down vote Since there is an interesting performance effect in the sparse/matrixmult approach I will post some adidtional results: M

= rand(1000,100);

sz = size(M);

% PLAIN LOOP

tic

out1 = zeros(sz(1),1);

for k = 1:10000

for n = 1:sz(1)

out1(n) = sum(M(n,1:n:100));

end

end

toc

% SPARSE MATRIXMULT

tic

A = sparse(sz);

for n = 1:sz(1)

A(1:n:sz(2),n) = 1;

end

for k = 1:10000

out2 = diag(M*A);

end

toc

isequal(out1,out2) % ok

Plain loop:

11.441380 seconds.

Sparse/matrixmult: 27.503829 seconds.

As the dimension of the matrix grows, the plain loop is more efficient.
|
this answer edited Jun 12 '13 at 17:16 answered Jun 12 '13 at 17:10 Oleg 8,289 1 17 42 Some additional details pertinent to my problem: 1) The number of rows of M are going to be ~30 max, and the columns will be ~1000 or more. 2) Also, you may assume that the sparse matrix can be done once, no need to always remake it. What would perform better in the bench marks with those details? Thanks. –  Spacey Jun 12 '13 at 17:15 With M 30x1000 I get plain loop 0.439726 seconds and sprase/matrixmult 0.858779 seconds where the creation of A takes only 0.004381 seconds (I had in mind a way to speed that up, but it's pointless). –  Oleg Jun 12 '13 at 17:20 Ok, interesting... I guess for my case a for loop would be faster... hmmm.. where/when would the advantage of the sparse matrix appear then? –  Spacey Jun 12 '13 at 17:31 there is a more efficient way. I added that as fifth method and put the benchmark. It performs really well for the extreme case Oleg mentioned. –  Mohsen Nosratinia Jun 12 '13 at 20:47

|

up vote 3 down vote favorite 1 In MATLAB, let us say I have a 10 x 100 matrix, called M. What I would like to do, is extract particular indicies of this matrix, and do an operation

×
• 登录
• 注册

×