# 解决aggregation - MATLAB: Aggregate and compare measurements with different sampling interval

itPublisher 分享于

I have measured a variable `x` in equidistant long intervals (every 10 min) and a variable `y` in non-equidistant short intervals (somewhere between every 30 s and 90 s). Timestamps (`datenum`) for both `x` and `y` are available, but they are never equal, so `intersect` doesn't work. How can I aggregate `y` (e.g. `mean(y(...))` in interval `x(i+1) - x(i))` so I can compare the two (e.g. plot them against each other or plot them with the same time-vector)?

/edit 1: Confused `x` and `y` in my last but one sentence.

/edit 2: I feel like I didn't give you enough information in the original question, sorry for that. Many of you suggest interpolation. `x` is an average wind speed over a period of 10 minutes, not a distinct measurement. So if I say time = 07:10 and `x = 3` m/s, that means `mean(x) = 3` m/s for the period from 07:00 to 07:10. This is why I think it's probably not the best idea to interpolate it. `y` is one of many (very noisy) other variables and I want to find out the influence of (mean) `x` on `y`. So I would either like to assign many values of `y` to one measurement of `x` (in that 10 minute period), or assign a `mean(y)` to that one measurement of `x`. I assume that the solutions are quite similar, code wise.

matlab aggregation
|
this question
edited Mar 13 '13 at 17:02 asked Mar 13 '13 at 16:25 Fred S 462 5 19      How about interpolating one of the results to another's time? (if the signal is not too noisy) –  Dedek Mraz Mar 13 '13 at 16:30      I thought about that. Sadly, they are quite noisy by nature and I fear that interpolation would be somewhat critical in the context of my research (for reasons not specified here). –  Fred S Mar 13 '13 at 16:34 1   Don't you mean aggregating `y` (as it's the longer array)? –  Eitan T Mar 13 '13 at 16:49 1   I empathize with you on the real-world signals. Been there. The only option for actual comparison (not plotting) is interpolation. Linear is the simplest, but Matlab has many. I mostly used `spline`. Depending on your signal properties, you could first filter (lowpass, sliding average) the noisy data and then interpolate. You could also show us the data plot. –  Dedek Mraz Mar 13 '13 at 16:55      EitanT: Yes, I edited my submission. Sorry for the confusion. Dedek Mraz: Thank you for your suggestions. Please see edit 2. –  Fred S Mar 13 '13 at 16:57  |  show more comments

You are trying to estimate the value of `x` at some point that you don't have the measurement for. You have measurements before and after. The only thing you can do is to interpolate. What method you choose is somewhat harder to decide.

/edit: If you just want to get an average value of `y` between two `x` measurements, I suggest the following:

``````new_y = zeros(size(x));
new_y(1) = mean(y(ty<=tx(1)));
for ii=2:length(x):
new_y(ii) = mean(y(and(ty>tx(ii-1),ty<=tx(ii))));
end
``````

Maybe an even better solution would be using hist:

``````n = hist(ty,tx)
``````

Vector `n` contains the number of values of `ty` that are closest to values in `tx`. Since both are monotonous, `n` tells you how to group values in `y`. Then you can use `mat2cell` to put `y` into a cell array where each cell corresponds to one measurement of `x`. The second parameter `n` now specifies how many values to put in each cell.

``````new_y = mat2cell(y,n)
``````

|
edited Mar 13 '13 at 18:25 answered Mar 13 '13 at 17:14 Dedek Mraz 607 5 13      I'm not trying to estimate / interpolate `x`. Actually, I don't want to touch `x` at all. Sorry for not being clear enough, English isn't my first language. What I want to do is aggregate (by averaging) `n` measurements of `y` and assign them to one value / timestamp of `x`; `n` being the number of measurements for `y` I have in the interval `x` was averaged over (i.e., in a 10 minute interval before the timestamp of `x`). And no, sadly I don't have the original measurements. I think they aren't even saved at all. –  Fred S Mar 13 '13 at 17:19 1   I'm sorry for forcing interpolation on you but I think this is the only option if you want to do it "right". And I believe the influence of `x` on `y` can best be seen this way. I'll update the answer. –  Dedek Mraz Mar 13 '13 at 18:01      I guess we disagree on the "right way" then. ;) I'd rather omit than fabricate information, but I can of course see why one would prefer interpolation. Anyway, thank you very much for your suggestions. I will certainly try them out and compare them to EitanT's solution and come back with some results in case anyone is interested. –  Fred S Mar 13 '13 at 20:25      Follow up, as promised: Both of your solutions after the edit seem to do exactly what I want and are somewhat more flexible with respect to variable time steps than EitanT's suggestion. I have yet to find out how to work with the output of your hist()-based solution, but I'll get aorund that. So, thanks a lot! –  Fred S Mar 17 '13 at 14:57

|

To aggregate values, use `accumarray`:

``````accumarray(fix(ty(:) / T) + 1, y, [], @mean)
``````

Here `y` is the sampled signal, `ty` is the timestamp array and `T` is the time interval of the aggregated values (for example, `T = 10 / (24 * 60) = 0.0069` for a 10-minute interval).

|
edited Mar 13 '13 at 17:27 answered Mar 13 '13 at 16:47 Eitan T 28.4k 11 43 79      Can't really get that to work, but I think this might be what I'm looking for. `Error using accumarray: Second input VAL must be a vector with one element for each row in SUBS, or a scalar.` You can find sample code here: pastebin.com/KP9nhEib . Thanks so far. –  Fred S Mar 13 '13 at 17:11 1   @FredS `ty` has to be a column vector, so just use `ty(:)`. I've amended my answer. –  Eitan T Mar 13 '13 at 17:28      Ah, thanks. While this does what I want for my sample code (although it creates one entry too much, but I think I'll figure that out), I have a problem with the `/ T` part because I have a few data logger defects where sampling intervals for `x` are 20 minutes (didn't actually see that until now - very long data set, sorry). Any idea what to do in this case? I have uploaded a short part of my data here: dl.dropbox.com/u/9437411/sample.mat –  Fred S Mar 13 '13 at 20:40

|

You can interpolate data from x to non-equidistant timestamps (or vice versa) (see interp1 function) and compare results.

Plot:

``````plot(Time_x, x, Time_y, y)
``````

|
answered Mar 13 '13 at 16:38 Fedyanint Tim 35 1 4

|

Here's a simple example of using the 1-d interpolation.

``````# make two example functions on different x bases.
x1 = [0:.023:10];
x2 = [0:1:10];
y1 = x1.^2/10;
y2 = 10 - x2.^1.3;

# convert both to a common x base (x1 in this case).
y2i = interp1(x2,y2,x1);
plot(x1,y1,x1,y2i)
``````

|
answered Mar 13 '13 at 16:43 Stuart 612 4 10

|

Use linear interpolation!

Its easy & fun to do yourself. The idea is: Since you know the timestamps for x, the values for x, and the values for y, (but the timestamps for y don't match that of x), you can use linear interpolation (or of higher order if you need) to interpolate/"update" values for y as if they occured at the timestamps of x. After that you can plot both x and interpolated y values against the same x timestamp vector.

|
answered Mar 13 '13 at 16:45 FredrikRedin 1,128 8 18

|

×
• 登录
• 注册

×