Extract data from MATLAB matrix without forloop
2020腾讯云双十一活动，全年最低！！！（领取3500元代金券），
地址：https://cloud.tencent.com/act/cps/redirect?redirect=1073
【阿里云】双十一活动，全年抄底价，限时3天！(老用户也有)，
入口地址：https://www.aliyun.com/1111/home
up vote 3 down vote favorite 1 In MATLAB, let us say I have a 10 x 100 matrix, called M. What I would like to do, is extract particular indicies of this matrix, and do an operation on them immediately, based on the row index, in a vectorized fashion. For example, for the first row, I want to compute sum(M(1, 1:1:100)). Then for row two, I want sum(M(2, 1:2:100)). For row three, I want sum(M(3, 1:3:100)), etc etc. For row ten, I have of course sum(M(10, 1:10:100)). I have this in a for loop, but I want to see if there is a way to extract this data without a for loop. Thank you. matlab matrix vectorization

this question asked Jun 12 '13 at 16:03 Spacey 1,213 4 31 51 1 Why do you want to avoid a loop? Are you experiencing performance issues? – Eitan T Jun 12 '13 at 16:08 @EitanT Yes, guess its not THAT big of a deal, but there might be potential performance issues down the line, so I am trying to see if there is a more elegant way. – Spacey Jun 12 '13 at 16:10 1 Pro tip: don't optimize code before benchmarking. Leave it as is, unless you're certain that this is the bottleneck. – Eitan T Jun 12 '13 at 16:12 As D. Knuth said: "...: premature optimization is the root of all evil." – Oleg Jun 12 '13 at 16:15 @OlegKomarov Ha! Thanks for the link ... I will just say that I am curious of how this can be done either way, for my own learning. I am a learnaholic after all... ;) – Spacey Jun 12 '13 at 16:16  show 7 more comments 3 Answers
up vote 3 down vote AcceptedAcceptedAccepted
You can try this for a oneliner S=arrayfun(@(n) sum(M(n,1:n:100)), 1:10)
Alternatively, you can create a sparse matrix beforehand A=sparse(100,10);
for n=1:10,
A(1:n:100, n)=1;
end
and find the sum by S=diag(M*A);
This can be optimized further for larger matrices by defining A=sparse(10,100) and S=sum(M.*A,2);
my quick benchamrking M=rand(10,100);
sz = size(M);
tic;
for k=1:10000,
for n=1:sz(1),
B(n)=sum(M(n,1:n:end));
end
end
toc
tic;
for k=1:10000,
B=arrayfun(@(n) sum(M(n,1:n:end)), 1:sz(1));
end
toc
tic;
for k=1:10000,
A=sparse(sz(2), sz(1));
for n=1:sz(1),
A(1:n:end, n)=1;
end
B=diag(M*A);
end
toc
tic;
A=sparse(sz(2),sz(1));
for n=1:sz(1),
A(1:n:end, n)=1;
end
for k=1:10000,
B=diag(M*A);
end
toc
tic;
A=sparse(sz(1),sz(2));
for n=1:sz(1),
A(n, 1:n:end)=1;
end
for k=1:10000,
B=sum(M.*A,2);
end
toc
returns Elapsed time is 0.552470 seconds.
Elapsed time is 2.409102 seconds.
Elapsed time is 0.638072 seconds.
Elapsed time is 0.052246 seconds.
Elapsed time is 0.061893 seconds.
for 30by1000 matrix Elapsed time is 1.785664 seconds.
Elapsed time is 3.954034 seconds.
Elapsed time is 4.760436 seconds.
Elapsed time is 0.926118 seconds.
Elapsed time is 0.865330 seconds.
and for 1000by100 matrix Elapsed time is 51.389322 seconds.
Elapsed time is 63.443414 seconds.
Elapsed time is 68.327187 seconds.
Elapsed time is 29.056304 seconds.
Elapsed time is 1.147215 seconds.

this answer edited Jun 12 '13 at 21:23 answered Jun 12 '13 at 16:21 Mohsen Nosratinia 9,242 1 19 46 Arrayfun is a concealed loop and will suffer from overhead in comparison to a plain loop. This solution is not purely vectorized. – Oleg Jun 12 '13 at 16:31 1 The question asked is "if there is a way to extract this data without a for loop". And this qualifies as an answer to that question. – Mohsen Nosratinia Jun 12 '13 at 16:36 I do consider arrayfun(), cellfun(), structufun() and all functions that have MATLAB code that execute a loop to be nonvectorized solutions. This is not just my personal opinion, but is widely assumed to be so especially because vectorized often stands for performance. In this perspective, only cellfun('stringInput',c) might be faster than an explicit loop. – Oleg Jun 12 '13 at 16:43 Mohsen I like your solutions... can you please benchmark the straight up for loop against the last sparse solution you provided? I think that might be the one... thanks so much. – Spacey Jun 12 '13 at 16:46 Fantastic! Looks like an entire order of magnitude gain with last sparse solution... brilliant! :) – Spacey Jun 12 '13 at 16:58  show 5 more comments up vote 5 down vote Proposed solution I was able to finally crack it for a truly vectorized solution that uses logical indexing to select the elements from the input matrix that are to be summed up. This magic was achieved with bsxfun with its optional functional handle @mod. The code is listed below  [m,n] = size(M);
mask = bsxfun(@mod,1:n,(1:m)')==1; %//'# all of magic happens here as it creates
%// a logical mask of 1's at places in input matrix
%// whose elements are to be summed and 0's elsewhere.
mask(1,:) = 1; %// set the first row as all ones as we need to sum all of those
sumvals = sum(mask.*M,2); %// finally get the sum values
Benchmarking In this benchmarking section I am including the four approaches  The one listed earlier in this post and its GPU ported version, arrayfun and sparse based approaches listed in the other solution. Three sets of input data were used for the benchmarking  Set 1: Number of columns is kept at a multiple factor of 10 with respect to the number of rows in the input matrix as used in the question. Set 2: The number of rows are extended so that number of rows is now 10x number of columns. This would really test out the loopy codes, as the number of iterations would be more in this case. Set 3: This set is an extension of set2, to further increase the number of rows and thus would be another great test between truly vectorized approaches against others. The function codes used for benchmarking are listed next  function sumvals = sumrows_stepped_bsxfun(M)
//... same as the code posted earlier
return
function sumvals = sumrows_stepped_bsxfun_gpu(M)
gM = gpuArray(M);
[m,n] = size(gM);
mask = bsxfun(@mod,gpuArray.colon(1,n),gpuArray.colon(1,m)')==1; %//'
sumvals = gather(sum(mask.*gM,2));
sumvals(1) = sum(M(1,:));
return
function S = sumrows_stepped_arrayfun(M)
[m,n] = size(M);
S = arrayfun(@(x) sum(M(x,1:x:n)), 1:m);
return
function B = sumrows_stepped_sparse(M)
sz = size(M);
A=sparse(sz(1),sz(2));
for n=1:sz(1),
A(n, 1:n:end)=1;
end
B=full(sum(M.*A,2));
return
Please note that timeit was used for timing CPU based codes and gputimeit for GPU based codes. System Configuration used for testing  MATLAB Version: 8.3.0.532 (R2014a)
Operating System: Windows 7
RAM: 3GB
CPU Model: Intel® Pentium® Processor E5400 (2M Cache, 2.70 GHz)
GPU Model: GTX 750Ti 2GB
The benchmarking results thus obtained  Conclusions For datasizes with number of rows lesser than the number of columns, the number of iterations being a small number, loopy codes seem to have the upperhand. As we increase the number of rows, the benefit of truly vectorized approaches become clear. You would also notice that bsxfun on CPU based approach does well for set 3, until around 12000 x 300 mark against non vectorized approaches and the reason behind that is, bsxfun creates this huge logical mask and at that point the memory bandwidth requirement becomes too high to cope up with the compute capability of bsxfun. This makes sense as vectorized operations by definition mean performing operations on many elements in one go, so memory bandwidth is essential. So, if you have a better machine with more RAM, that 12000 x 300 mark should extend further. If you could extend the number of rows further, the benefits of a vectorized solution would become more clear as long as the memory bandwidth is kept in check. Benchmarking Code Finally here's the benchmarking code if anyone would like to test it out on their systems  clear all; clc; close all
outputfile = 'results.xlsx';
delete(outputfile); %// remove file, so that new results could be written into
base_datasize_array = 40:60:400;
methods = {'BSXFUN on GPU','BSXFUN on CPU','ARRAYFUN','SPARSE'};
num_approaches = numel(methods);
num_sets = 3;
timeall_all = zeros(num_approaches,numel(base_datasize_array),num_sets);
datasize_lbs = cell(numel(base_datasize_array),num_sets);
for set_id = 1:num_sets
switch set_id
case 1
N1_arr = base_datasize_array*2;
N2_arr = N1_arr*10;
case 2
N2_arr = base_datasize_array*2;
N1_arr = N2_arr*10;
case 3
N2_arr = base_datasize_array;
N1_arr = N2_arr*40;
end
timeall = zeros(num_approaches,numel(N1_arr));
for iter = 1:numel(N1_arr)
M = rand(N1_arr(iter),N2_arr(iter));
f = @() sumrows_stepped_bsxfun_gpu(M);
timeall(1,iter) = gputimeit(f); clear f
f = @() sumrows_stepped_bsxfun(M);
timeall(2,iter) = timeit(f); clear f
f = @() sumrows_stepped_arrayfun(M);
timeall(3,iter) = timeit(f); clear f
f = @() sumrows_stepped_sparse(M);
timeall(4,iter) = timeit(f); clear f
end
timeall_all(:,:,set_id) = timeall;
wp = repmat({' '},numel(N1_arr),1);
datasize_lbs(:,set_id) = strcat(cellstr(num2str(N1_arr.')),' x ',...
wp,cellstr(num2str(N2_arr.')));
end
for set_id=1:num_sets
out_cellarr = cell(numel(methods)+1,numel(N1_arr)+1);
out_cellarr(1,1) = {'Methods'};
out_cellarr(2:end,1) = methods;
out_cellarr(1,2:end) = datasize_lbs(:,set_id);
out_cellarr(2:end,2:end) = cellfun(@(x) num2str(x),...
num2cell(timeall_all(:,:,set_id)),'Uni',0);
xlswrite(outputfile, out_cellarr,set_id);
end

this answer edited May 23 '17 at 12:16 Community ♦ 1 1 answered Nov 15 '14 at 18:30 Divakar 140k 14 60 142 1 Kudos! except that the figures are made in Excel, Matlab can produce equally ugly plots :) – Arpi Nov 15 '14 at 19:40 @Arpi Yeah! Well MATLAB plots don't look that "sleek", so made that switch! ;) Think it's less ugly with excel! – Divakar Nov 15 '14 at 20:36
 up vote 1 down vote Since there is an interesting performance effect in the sparse/matrixmult approach I will post some adidtional results: M
= rand(1000,100);
sz = size(M);
% PLAIN LOOP
tic
out1 = zeros(sz(1),1);
for k = 1:10000
for n = 1:sz(1)
out1(n) = sum(M(n,1:n:100));
end
end
toc
% SPARSE MATRIXMULT
tic
A = sparse(sz);
for n = 1:sz(1)
A(1:n:sz(2),n) = 1;
end
for k = 1:10000
out2 = diag(M*A);
end
toc
isequal(out1,out2) % ok
Plain loop:
11.441380 seconds.
Sparse/matrixmult: 27.503829 seconds.
As the dimension of the matrix grows, the plain loop is more efficient.

this answer edited Jun 12 '13 at 17:16 answered Jun 12 '13 at 17:10 Oleg 8,289 1 17 42 Some additional details pertinent to my problem: 1) The number of rows of M are going to be ~30 max, and the columns will be ~1000 or more. 2) Also, you may assume that the sparse matrix can be done once, no need to always remake it. What would perform better in the bench marks with those details? Thanks. – Spacey Jun 12 '13 at 17:15 With M 30x1000 I get plain loop 0.439726 seconds and sprase/matrixmult 0.858779 seconds where the creation of A takes only 0.004381 seconds (I had in mind a way to speed that up, but it's pointless). – Oleg Jun 12 '13 at 17:20 Ok, interesting... I guess for my case a for loop would be faster... hmmm.. where/when would the advantage of the sparse matrix appear then? – Spacey Jun 12 '13 at 17:31 there is a more efficient way. I added that as fifth method and put the benchmark. It performs really well for the extreme case Oleg mentioned. – Mohsen Nosratinia Jun 12 '13 at 20:47
