UNEQUAL CLUSTER SAMPLING
UNEQUAL CLUSTER SAMPLING
Bhagya Jayesh
2048119
INTRODUCTION
In sampling,
it is assumed that the population is divided into a finite number of distinct
units defined as sampling units. The smallest unit into which the population
can be divided is called an element of the population. A group of such elements
is known as a cluster.
Clusters are
generally made up of neighbouring elements and thus the elements within a
cluster tend to have similar characteristics. The general rule is that the
number of elements in a cluster should be small and the number of clusters
should be large.
There exists
two types of cluster sampling equal and
unequal cluster sampling.
Under
unequal cluster sampling the number of elements in each cluster will be
different. The size of the clusters are different unlike equal cluster
sampling.
ADVANTAGES AND DISADVANTAGES
The
advantages of cluster sampling are :
- Collection of neighbouring elements
are easier
- Less costly
- When the sampling frame of the elements
is not readily available
The disadvantages of cluster sampling are :
- Biased samples- The method is prone
to biases. If the clusters that represent the entire population were formed
under a biased opinion, the inferences about the entire population would be
biased as well.
- High sampling error - Generally, the samples drawn using the
cluster method are prone to higher sampling error than the samples formed using
other sampling methods.
FORMULAS
Suppose there are N clusters. Let the ith cluster consist of
Mi elements (i=1,2,3…..N)
And
.
The population mean per element Ybar is
defined by
i=1,2,3,…..N where yibar is the mean per element of the ith cluster.
Let a random sample wor of n clusters be
drawn and all elements of the clusters surveyed,. The estimator of ybar can
be given by
i=1,2,…..n
Similarly the unbiased estimator of Var(ybar)
can be given by
Usually
, it is difficult to find equal size clusters from the population which depict
homogeneous characteristics between clusters. Hence unequal cluster sampling is
used.
Cluster
sampling is typically
used in market research. It's used when a researcher can't get information
about the population as a whole, but they can get information about the clusters. Cluster sampling is often more
economical or more practical than stratified sampling or simple
random sampling.
Question
A survey on pepper was
conducted to estimate the number of pepper standards and production of pepper
in Kerala. For this 3 clusters from 95 were selected by srswor. The information
on the number of pepper standards recorded is given below:
Cluster number |
Cluster size |
No. of pepper standards |
1 |
11 |
41,16,19,15,144,454,212,57,28,76,199 |
2 |
12 |
39,70,38,37,161,38,27,219,46,128,30,20 |
3 |
7 |
115,59,120,36,411,197,17 |
- Find the estimator of population mean
per element
- Find Sb(square) which is an unbiased
estimator of variance and hence the standard error.
- Examine the relative efficiency of
unequal cluster sampling w.r.t. simple random sampling
ANALYSIS
CODES
cluster_1=c(41,16,19,15,144,454,212,57,28,76,199)
cluster_2=c(39,70,38,37,161,38,27,219,46,128,30,20)
cluster_3=c(115,59,120,36,411,197,17)
#There are 3 different clusters of sizes 11,12
and 7.
M1=length(cluster_1)
M2=length(cluster_2)
M3=length(cluster_3)
M1
## [1] 11
M2
## [1] 12
M3
## [1] 7
y1m=sum(cluster_1)/M1
y2m=sum(cluster_2)/M2
y3m=sum(cluster_3)/M3
Mbar=(M1+M2+M3)/3
Mbar
## [1] 10
Mo=M1+M2+M3
#average cluster size
n=3 #random sample of n clusters drawn
M=30 # total elements
N=95 # Total number of clusters in the population
ybar=((M1*y1m)+(M2*y2m)+(M3*y3m))/(n*Mbar)
ybar
## [1] 102.3
#estimate the average pepper
standards
x=(y1m-ybar)^2 + (y2m-ybar)^2 + (y3m-ybar)^3
Sb_square=x/(n-1)
Sb_square
## [1] 20439.12
f=n/N
var_ybar=((1-f)*Sb_square)/n
var_ybar #unbiased
estimator of variance for mean
## [1] 6597.891
SE=sqrt(var_ybar)
SE
## [1] 81.22741
#standard error for the region
#simple random sampling is performed
to compare it with unequal cluster sampling.
data=c(41,16,19,15,144,454,212,57,28,76,199,39,70,38,37,161,38,27,219,46,128,30,20,115,59,120,36,411,197,17)
data
## [1] 41
16 19 15 144 454 212 57
28 76 199 39
70 38 37 161
38 27 219
## [20] 46
128 30
20 115 59 120 36 411 197
17
data_m=mean(data)
z=sum((data-data_m)^2)
s_square=z/(n-1)
s_square
## [1] 177640.1
RE=s_square/(Mbar*Sb_square)
RE
## [1] 0.8691184
#relative efficiency of unequal
cluster sampling
#as
the variation between clusters decreases the efficiency increases. In general
cluster sampling will be efficient only when the variation between clusters is
small as possible.
The
clusters are of three different sizes – 11,12 and 7.
The
mean of each cluster is calculated and the average cluster size is found to be
10.
The average
pepper standards is calculated and is found to be 102.3.
The
standard error for the region is found to be 81.22741.
The
relative efficiency of unequal cluster sampling w.r.t. simple random sampling
is calculated and is found to 0.87 which implies it is 87% more efficient
because the sampling variance between the clusters is small.
In most
cases it is impossible to find equal size clusters from the population which
depict homogeneous characteristics between clusters and remain heterogenous
within a cluster so that it can be a true representative of the population. Hence
unequal cluster sampling is used.
The technique is widely used in statistics where the
researcher can’t collect data from the entire population as a whole. It is the
most economical and practical solution for statisticians doing research. Take
the example of a researcher who is looking to understand the smartphone usage
in Germany. In this case, the cities of Germany will form clusters. This sampling
method is also used in situations like wars and natural calamities to draw
inferences of a population, where collecting data from every individual
residing in the population is impossible.
Comments
Post a Comment