CLUSTER SAMPLING
Sneha Maria George
some basic Definition’s:
Population:
Population is basically
the set that involves all the variables of interest.
Sample:
The sample is a subset drawn
from the population under consideration, for which we have the data.
Introduction:
When the population Is
large, considering the whole population and interpreting results is impossible
and time-consuming. This is when the sample comes into the picture.
It would be much
simpler and quicker to measure a subset of the population rather than the whole
population.
However, if we want our
sample to be a true reflection of the population to give us accurate results,
we cannot just randomly choose any unit from the population as we wish. The method by which we select our sample is important.
There are mainly two
types of Sampling:
· Random
Sampling
· Non-Random
Sampling
Both of this has pros
and cons of their own
Looking at Random
Sampling, it mainly involves four methods, namely:
· Simple
Random Sampling (SRS)
· Systematic
Sampling
· Stratified
Sampling
· Cluster
Sampling
Here,
our Topic of interest is Cluster sampling.
Cluster Sampling:
Here,
in this Sampling method. We divide the population into clusters and then randomly
choose some clusters or groups as samples. After the election procedure, we carry
out complete enumeration by considering every cluster's unit into the calculation.
Cluster
Sampling is a probability Sampling method and is often used to study large
populations.
Cluster sampling includes two Cases:
· Clusters
Selected Are of Equal Size
· Clusters Selected Are of Unequal Size.
There are Primarily two types of
Cluster Sampling:
Single -Stage Sampling:
In this case, we collect data from each
and every unit within the selected cluster.
Two-Stage Sampling:
Here, we select clusters by applying
Random Sampling Technique.
As
we know everything has positives
and negatives. Cluster Sampling is more efficient and less time consuming when
compared with other probability sampling methods (Specifically when a large
Data is under Consideration). However, when compared to other methods such as
Simple Random Sampling provides less statistical Significance because it Is
difficult to ensure that the clusters we have selected properly represent the
population as a whole. That is a high chance of Sampling Error Associated.
Formulas:
Case of Equal
Clusters:
· Suppose the population is divided into N clusters and each cluster is of size M .
· Select a sample
of n clusters from N clusters by the method of SRS, generally WOR.
So total population size = NM
total sample size = nM .
Population Mean:
Variance:
where,
Suppose there are N clusters. Let the ith cluster consist of Mi elements (i=1,2,3…..N) And
The population mean per element Ybar is defined by :
i=1,2,…..n
Similarly the unbiased estimator of Var(ybar) can
be given by

Real-Life Application:
Cluster Sampling is one concept that is used in
our daily life. For example, in the case of cooking, to check if all the vegetables
are cooked in a curry, we randomly would select a few and check rather than considering
the whole vegetable present in it.
Cluster sampling is
typically used in market research. For instance, it Is used when a
researcher can’t get information about the population as
a whole, but then they can obtain information about the
clusters. For instance, suppose a researcher may be interested in data
about City taxes of a particular State (suppose Karnataka) The researcher
would obtain data from selected cities and compile them to get a picture about
the state Condition. Here, the individual cities would be considered as clusters.
Now, let’s illustrate one stage/Single Stage
Sampling in R:
About the Dataset:
The Academic Performance Index is computed
for all California schools based on standardized testing of students. The data
sets contain information for all schools with at least 100 students and for
various probability samples of the data.
ANALYSIS:
Interpretation from R:
This is an In-built Dataset that contains Variables according to which we can divide our whole population into clusters. Here, our variable dnum is through which we have divided the population into clusters. We have performed a one-stage cluster sampling and have obtained mean and standard Error for our variable of Interest(Enroll)549.72 and 45.191 The missing values can also be obtained easily through this method. Here a cluster of size 15 has been drawn.
When compared to Simple Random Sampling and
Stratified Sampling, cluster sampling has few advantages over them:
1.Cluster Sampling requires fewer resources:
Since only a few groups from the entire population
is taken into consideration, it requires fewer resource for
sampling, which makes it cheaper relative to another sampling.
2.More feasible
the feasibility of the sampling is increased
because of the division of the entire population into groups that are
Homogeneous.
Disadvantages of Cluster
Sampling:
1.Biased Sampling:
If the clusters we choose are taken considering
biased opinion. The inferences of the whole population would also be biased.
2.Sampling Error Is High:
Samples that are drawn using the cluster method
have higher sampling error comparatively to other samples drawn using other
sampling methods.
Conclusion:
Comments
Post a Comment