ITKeyword,专注技术干货聚合推荐

注册 | 登录

解决statistics - MATLAB: Taking sample with same number of values from each class

itPublisher 分享于

2020腾讯云双十一活动,全年最低!!!(领取3500元代金券),
地址https://cloud.tencent.com/act/cps/redirect?redirect=1073

2020阿里云最低价产品入口,含代金券(新老用户有优惠),
地址https://www.aliyun.com/minisite/goods

I have a full dataset of lets say 50000 observations which are assigned to 16 classes. I now want to draw a Sample of let's say 70% of the full data, but I want MATLAB to take the same number of samples from each class (if possible of course, because some classes have less numbers than needed)

Is there a MATLAB function that can do this, or do I have to program a new one for myself? I'm just trying to save time here.

I found cvpartition, but as far as I know this can be used only to take a sample with the same distribution over the classes as the original dataset and not a uniformly distributed sample.

Thank you for your help!

matlab statistics distribution sampling
|
  this question
edited Mar 27 '13 at 10:22 asked Mar 27 '13 at 10:16 mischa.mole 122 6      For the small groups you may want to sample each value more than once. At least this will get you an equal amount of observations per group. –  Dennis Jaheruddin Mar 27 '13 at 10:58      possible duplicate of Load all the images from a directory –  Iswanto San Mar 28 '13 at 0:07

 | 

1 Answers
1

解决方法

It shouldn't be too hard. Let's say that the observations are in a vector observations. Then you can do

fraction = 0.7;

classes = unique(observations);
nObs = length(observations);
nClasses = length(classes);
nSamples = round(nObs * fraction / nClasses);

for ii = 1:nClasses
    idx = observations == classes(ii);
    samples((ii-1)*nSamples+1:ii*nSamples) = randsample(observations(idx), nSamples);
end

Now samples is a vector of length nClasses * nsamples that contains your sampled observations, with an equal number from each class.

At the moment it will fail if one of the classes doesn't contain at least nSamples observations. The simplest fix is to add the additional arguments 'replace','true' to the call to randsample, which will tell it to replace each observation after being picked.


|
  this answer
answered Mar 27 '13 at 10:52 Chris Taylor 33.9k 8 81 136      Thank you, saves me some thinking time :-) I just thought maybe there is a builtin Matlab function that can do that....BR, Mischa –  mischa.mole Mar 27 '13 at 10:58

 | 


相关阅读排行


相关内容推荐

最新文章

×

×

请激活账号

为了能正常使用评论、编辑功能及以后陆续为用户提供的其他产品,请激活账号。

您的注册邮箱: 修改

重新发送激活邮件 进入我的邮箱

如果您没有收到激活邮件,请注意检查垃圾箱。