Equal cluster sampling
EQUAL CLUSTER SAMPLING
Ayana Shaji
2048118
Cluster sampling:
In random sampling, it is presumed that the population has been
divided into a finite number of distinct and identifiable units defined as
sampling units. The smallest unit into which the population can be divided is
called an element of the population. A group of such elements is known as a
cluster. When the sampling unit is a cluster the procedure is called cluster
sampling.
Equal cluster sampling:
In equal cluster sampling is the cluster sampling when all the
clusters are of equal size.
Number of clusters=N
Size of cluster=M
Procedure:
i)Suppose the population is divided into N clusters and each
cluster is of size M .
ii)Select a sample of n
clusters from N clusters by the method of SRS, generally WOR.
total population size = NM total sample size = nM .
Complete enumeration should be done after choosing clusters into
sample.
PROPERTIES:
Estimation of
population mean:
First select n clusters from N clusters by SRSWOR.
Based on n clusters, find the mean of each cluster separately
based on all the units in every cluster.
So we have the cluster means as y1bar,y2bar….ynbar .
Consider the mean of all such cluster means as an estimator of
population mean as
then,
Variance:
The variance of y_bar_cl can be derived on the same lines as
deriving the variance of sample mean in SRSWOR. The only difference is that in
SRSWOR, the sampling units are y1,y2,....yn whereas in case of y_bar_cl , the
sampling units are y1bar,y2bar….ynbar.
Efficiency:
Efficiency of cluster sampling increases as the mean square
between cluster means decreases.
Efficiency of cluster sampling with respect to SRS,
RE=S^2/MSb^2
Optimum choice of
cluster size:
For equal cluster sampling, the efficiency increases as the
number of clusters increases.Also, the cost increases with the increase in the
cluster size.The cluster should be chosen in such a way that the cost is
minimum and the efficiency is high. cluster sampling will
be efficient if clusters are so formed that the variation the between cluster
means is as small as possible while variation within the clusters is as large
as possible.
Application:
Example 1:
An
example of cluster sampling is area sampling
or geographical
cluster sampling.where the area is divided into different clusters
hence making the survey easier.
Example 2:
Cluster sampling is used to estimate high
mortalities in cases such as wars,
famines and natural disasters.
R code:
Arguments
data
data frame or data matrix; its number of rows
is N, the population size.
clustername
the name of the clustering variable.
size
sample size.
method
method to select clusters; the following
methods are implemented: simple random sampling without replacement (srswor),
simple random sampling with replacement (srswr), Poisson sampling (poisson),
systematic sampling (systematic); if the method is not specified, by default
the method is "srswor".
pik
vector of inclusion probabilities or auxiliary
information used to compute them; this argument is only used for unequal
probability sampling (Poisson, systematic). If an auxiliary information is
provided, the function uses the inclusionprobabilities
function for computing these probabilities.
description
a message is printed if its value is TRUE; the
message gives the number of selected clusters, the number of units in the
population and the number of selected units. By default, the value is FALSE.
############
## Example 1
############
# Uses the swissmunicipalities data to draw a
sample of clusters
data(swissmunicipalities)
# the variable 'REG' has 7 categories in the
population
# it is used as clustering variable
# the sample size is 3; the method is simple
random sampling without replacement
cl=cluster(swissmunicipalities,clustername=c("REG"),size=3,method="srswor")
# extracts the observed data
# the order of the columns is different from
the order in the initial database
getdata(swissmunicipalities, cl)
############
## Example 2
############
# the same data as in Example 1
# the sample size is 3; the method is
systematic sampling
# the pik vector is randomly generated using
the U(0,1) distribution
cl_sys=cluster(swissmunicipalities,clustername=c("REG"),size=3,method="systematic",
pik=runif(7))
# extracts the observed data
getdata(swissmunicipalities,cl_sys)
Advantages:
1)Collection of data for neighbouring elements
is easier, cheaper, faster and operationaly more convenient than observing
units that spread over a region.
2)It is
less costly than simple random sampling due to the saving of time in journeys,
identification, contacts, etc.
2)collection of sampling frame is not required
Disadvantage:
1)This method is prone to bias
Comments
Post a Comment