UNEQUAL CLUSTER SAMPLING

 

UNEQUAL CLUSTER SAMPLING

Bhagya Jayesh

2048119



INTRODUCTION

In sampling, it is assumed that the population is divided into a finite number of distinct units defined as sampling units. The smallest unit into which the population can be divided is called an element of the population. A group of such elements is known as a cluster.

Clusters are generally made up of neighbouring elements and thus the elements within a cluster tend to have similar characteristics. The general rule is that the number of elements in a cluster should be small and the number of clusters should be large.

There exists two types of cluster sampling equal  and unequal  cluster sampling.

Under unequal cluster sampling the number of elements in each cluster will be different. The size of the clusters are different unlike equal cluster sampling.

 


ADVANTAGES AND DISADVANTAGES

The advantages of cluster sampling are :

  •      Collection of neighbouring elements are easier
  •      Less costly
  •      When the sampling frame of the elements is not readily available

The disadvantages  of cluster sampling are :

  •      Biased samples- The method is prone to biases. If the clusters that represent the entire population were formed under a biased opinion, the inferences about the entire population would be biased as well.
  •      High sampling error -  Generally, the samples drawn using the cluster method are prone to higher sampling error than the samples formed using other sampling methods.

 

 

FORMULAS

 

Suppose there are N clusters. Let the ith cluster consist of Mi elements (i=1,2,3…..N)

And   


 .

The population mean per element Ybar is defined by   

 


    i=1,2,3,…..N where yibar is the mean per element of the ith cluster.

 

Let a random sample wor of n clusters be drawn and all elements of the clusters surveyed,. The estimator of ybar can be given by





     i=1,2,…..n

 

Similarly the unbiased estimator of Var(ybar) can be given by



 

 APPLICATIONS

Usually , it is difficult to find equal size clusters from the population which depict homogeneous characteristics between clusters. Hence unequal cluster sampling is used.

Cluster sampling is typically used in market research. It's used when a researcher can't get information about the population as a whole, but they can get information about the clusters. Cluster sampling is often more economical or more practical than stratified sampling or simple random sampling.

Question

A survey on pepper was conducted to estimate the number of pepper standards and production of pepper in Kerala. For this 3 clusters from 95 were selected by srswor. The information on the number of pepper standards  recorded is given below:

Cluster number

Cluster size

No. of pepper standards

1

11

41,16,19,15,144,454,212,57,28,76,199

2

12

39,70,38,37,161,38,27,219,46,128,30,20

3

7

115,59,120,36,411,197,17

 

  1.  Find the estimator of population mean per element
  2. Find Sb(square) which is an unbiased estimator of variance and hence the standard error.
  3. Examine the relative efficiency of unequal cluster sampling w.r.t. simple random sampling

ANALYSIS

CODES






















cluster_1=c(41,16,19,15,144,454,212,57,28,76,199)
cluster_
2=c(39,70,38,37,161,38,27,219,46,128,30,20)
cluster_
3=c(115,59,120,36,411,197,17)

#There are 3 different clusters of sizes 11,12 and 7.

M1=
length(cluster_1)
M2=
length(cluster_2)
M3=
length(cluster_3)
M1

## [1] 11

M2

## [1] 12

M3

## [1] 7

y1m=sum(cluster_1)/M1
y2m=
sum(cluster_2)/M2
y3m=
sum(cluster_3)/M3

Mbar=(M1+M2+M3)/3
Mbar

## [1] 10

Mo=M1+M2+M3

#average cluster size

n=3 #random sample of n clusters drawn
M=
30 # total elements
N=
95 # Total number of clusters in the population

ybar=((M1*y1m)+(M2*y2m)+(M3*y3m))/(n*Mbar)
ybar

## [1] 102.3

#estimate the average pepper standards

x=(y1m-ybar)^2 + (y2m-ybar)^2 + (y3m-ybar)^3
Sb_square=x
/(n-1)
Sb_square

## [1] 20439.12

f=n/N
var_ybar=((
1-f)*Sb_square)/n
var_ybar
#unbiased estimator of variance for mean

## [1] 6597.891

SE=sqrt(var_ybar)
SE

## [1] 81.22741

#standard error for the region

#simple random sampling is performed to compare it with unequal cluster sampling.

data=c(41,16,19,15,144,454,212,57,28,76,199,39,70,38,37,161,38,27,219,46,128,30,20,115,59,120,36,411,197,17)
data

##  [1]  41  16  19  15 144 454 212  57  28  76 199  39  70  38  37 161  38  27 219
## [20]  46 128  30  20 115  59 120  36 411 197  17

data_m=mean(data)
z=
sum((data-data_m)^2)
s_square=z
/(n-1)
s_square

## [1] 177640.1

RE=s_square/(Mbar*Sb_square)
RE

## [1] 0.8691184

#relative efficiency of unequal cluster sampling

#as the variation between clusters decreases the efficiency increases. In general cluster sampling will be efficient only when the variation between clusters is small as possible.


 

 INFERENCE

The clusters are of three different sizes – 11,12 and 7.

The mean of each cluster is calculated and the average cluster size is found to be 10.

The average pepper standards is calculated and is found to be 102.3.

The standard error for the region is found to be 81.22741.

The relative efficiency of unequal cluster sampling w.r.t. simple random sampling is calculated and is found to 0.87 which implies it is 87% more efficient because the sampling variance between the clusters is small.

 

In most cases it is impossible to find equal size clusters from the population which depict homogeneous characteristics between clusters and remain heterogenous within a cluster so that it can be a true representative of the population. Hence unequal cluster sampling is used.

The technique is widely used in statistics where the researcher can’t collect data from the entire population as a whole. It is the most economical and practical solution for statisticians doing research. Take the example of a researcher who is looking to understand the smartphone usage in Germany. In this case, the cities of Germany will form clusters. This sampling method is also used in situations like wars and natural calamities to draw inferences of a population, where collecting data from every individual residing in the population is impossible.

 

 

 

Comments

Popular posts from this blog

Comparing the efficiency of SRSWOR and SRSWR with the help of R Programming

Selection of samples:SRSWR vs SRSWOR(2048114)

pps (probability proportional to size) Systematic Sampling