Equal cluster sampling

 

                                                              EQUAL CLUSTER SAMPLING

 

Ayana Shaji

 

2048118

 

Cluster sampling:

In random sampling, it is presumed that the population has been divided into a finite number of distinct and identifiable units defined as sampling units. The smallest unit into which the population can be divided is called an element of the population. A group of such elements is known as a cluster. When the sampling unit is a cluster the procedure is called cluster sampling.

 

Equal cluster sampling:

 

In equal cluster sampling is the cluster sampling when all the clusters are of equal size.

Number of clusters=N

Size of cluster=M

 

 



 

Procedure:

i)Suppose the population is divided into N clusters and each cluster is of size M .

 ii)Select a sample of n clusters from N clusters by the method of SRS, generally WOR.

total population size = NM total sample size = nM .

Complete enumeration should be done after choosing clusters into sample.

 

 

 

PROPERTIES:

 

Estimation of population mean:

 

First select n clusters from N clusters by SRSWOR.

Based on n clusters, find the mean of each cluster separately based on all the units in every cluster.

So we have the cluster means as y1bar,y2bar….ynbar .

Consider the mean of all such cluster means as an estimator of population mean as   

 



then,

 



Variance:

The variance of y_bar_cl can be derived on the same lines as deriving the variance of sample mean in SRSWOR. The only difference is that in SRSWOR, the sampling units are y1,y2,....yn whereas in case of y_bar_cl , the sampling units are  y1bar,y2bar….ynbar.




Efficiency:

Efficiency of cluster sampling increases as the mean square between cluster means decreases.

Efficiency of cluster sampling with respect to SRS,

                                                           RE=S^2/MSb^2

 

 

Optimum choice of cluster size:

For equal cluster sampling, the efficiency increases as the number of clusters increases.Also, the cost increases with the increase in the cluster size.The cluster should be chosen in such a way that the cost is minimum and the efficiency is high. cluster sampling will be efficient if clusters are so formed that the variation the between cluster means is as small as possible while variation within the clusters is as large as possible.

 

 

Application:

 

Example 1:

An example of cluster sampling is area sampling or geographical cluster sampling.where the area is divided into different clusters hence making the survey easier.

Example 2:

Cluster sampling is used to estimate high mortalities in cases such as wars, famines and natural disasters.

 

R code:

Arguments

data

data frame or data matrix; its number of rows is N, the population size.

clustername

the name of the clustering variable.

size

sample size.

method

method to select clusters; the following methods are implemented: simple random sampling without replacement (srswor), simple random sampling with replacement (srswr), Poisson sampling (poisson), systematic sampling (systematic); if the method is not specified, by default the method is "srswor".

pik

vector of inclusion probabilities or auxiliary information used to compute them; this argument is only used for unequal probability sampling (Poisson, systematic). If an auxiliary information is provided, the function uses the inclusionprobabilities function for computing these probabilities.

description

a message is printed if its value is TRUE; the message gives the number of selected clusters, the number of units in the population and the number of selected units. By default, the value is FALSE.

 

 

############

## Example 1

############

# Uses the swissmunicipalities data to draw a sample of clusters

data(swissmunicipalities)

# the variable 'REG' has 7 categories in the population

# it is used as clustering variable

# the sample size is 3; the method is simple random sampling without replacement

cl=cluster(swissmunicipalities,clustername=c("REG"),size=3,method="srswor")

# extracts the observed data

# the order of the columns is different from the order in the initial database

getdata(swissmunicipalities, cl)

############

## Example 2

############

# the same data as in Example 1

# the sample size is 3; the method is systematic sampling

# the pik vector is randomly generated using the U(0,1) distribution

cl_sys=cluster(swissmunicipalities,clustername=c("REG"),size=3,method="systematic",

pik=runif(7))

# extracts the observed data

getdata(swissmunicipalities,cl_sys)

 

 

Advantages:

 

1)Collection of data for neighbouring elements is easier, cheaper, faster and operationaly more convenient than observing units that spread over a region.

2)It  is less costly than simple random sampling due to the saving of time in journeys, identification, contacts, etc.

2)collection of sampling frame is not required

 

Disadvantage:

1)This method is prone to bias

 

 

Comments

Popular posts from this blog

Comparing the efficiency of SRSWOR and SRSWR with the help of R Programming

Selection of samples:SRSWR vs SRSWOR(2048114)

pps (probability proportional to size) Systematic Sampling