# 解决statistics - MATLAB: Taking sample with same number of values from each class

I have a full dataset of lets say 50000 observations which are assigned to 16 classes. I now want to draw a Sample of let's say 70% of the full data, but I want MATLAB to take the same number of samples from each class (if possible of course, because some classes have less numbers than needed)

Is there a MATLAB function that can do this, or do I have to program a new one for myself? I'm just trying to save time here.

I found `cvpartition`, but as far as I know this can be used only to take a sample with the same distribution over the classes as the original dataset and not a uniformly distributed sample.

edited Mar 27 '13 at 10:22 asked Mar 27 '13 at 10:16 mischa.mole 122 6      For the small groups you may want to sample each value more than once. At least this will get you an equal amount of observations per group. –  Dennis Jaheruddin Mar 27 '13 at 10:58

It shouldn't be too hard. Let's say that the observations are in a vector `observations`. Then you can do

``````fraction = 0.7;

classes = unique(observations);
nObs = length(observations);
nClasses = length(classes);
nSamples = round(nObs * fraction / nClasses);

for ii = 1:nClasses
idx = observations == classes(ii);
samples((ii-1)*nSamples+1:ii*nSamples) = randsample(observations(idx), nSamples);
end
``````

Now `samples` is a vector of length `nClasses * nsamples` that contains your sampled observations, with an equal number from each class.

At the moment it will fail if one of the classes doesn't contain at least `nSamples` observations. The simplest fix is to add the additional arguments `'replace','true'` to the call to `randsample`, which will tell it to replace each observation after being picked.

answered Mar 27 '13 at 10:52 Chris Taylor 33.9k 8 81 136      Thank you, saves me some thinking time :-) I just thought maybe there is a builtin Matlab function that can do that....BR, Mischa –  mischa.mole Mar 27 '13 at 10:58

