CLUSTER SAMPLING

 Sneha Maria George


some basic Definition’s:

Population:

Population is basically the set that involves all the variables of interest.

Sample:

The sample is a subset drawn from the population under consideration, for which we have the data.

Introduction:

When the population Is large, considering the whole population and interpreting results is impossible and time-consuming. This is when the sample comes into the picture.

It would be much simpler and quicker to measure a subset of the population rather than the whole population.

However, if we want our sample to be a true reflection of the population to give us accurate results, we cannot just randomly choose any unit from the population as we wish. The method by which we select our sample is important.

There are mainly two types of Sampling:

·       Random Sampling

·       Non-Random Sampling

Both of this has pros and cons of their own

Looking at Random Sampling, it mainly involves four methods, namely:

·       Simple Random Sampling (SRS)

·       Systematic Sampling

·       Stratified Sampling

·       Cluster Sampling

 

Here, our Topic of interest is Cluster sampling.

Cluster Sampling:

Here, in this Sampling method. We divide the population into clusters and then randomly choose some clusters or groups as samples. After the election procedure, we carry out complete enumeration by considering every cluster's unit into the calculation.

Cluster Sampling is a probability Sampling method and is often used to study large populations.

Cluster  sampling includes two Cases:

·       Clusters Selected Are of Equal Size

·       Clusters Selected Are of Unequal Size.




There are Primarily two types of Cluster Sampling:

Single -Stage Sampling:

In this case, we collect data from each and every unit within the selected cluster.

Two-Stage Sampling:

Here, we select clusters by applying Random Sampling Technique.

As we know everything has positives and negatives. Cluster Sampling is more efficient and less time consuming when compared with other probability sampling methods (Specifically when a large Data is under Consideration). However, when compared to other methods such as Simple Random Sampling provides less statistical Significance because it Is difficult to ensure that the clusters we have selected properly represent the population as a whole. That is a high chance of Sampling Error Associated.

Formulas:

Case of Equal Clusters:

· Suppose the population is divided into N clusters and each cluster is of size M .

 · Select a sample of n clusters from N clusters by the method of SRS, generally WOR.

So total population size = NM

total sample size = nM .

Population Mean:

Variance:

where,

 

is the mean sum of squares between cluster means in the sample.

Case of Un-Equal Cluster:

Suppose there are N clusters. Let the ith cluster consist of Mi elements (i=1,2,3…..N) And   

The population mean per element Ybar is defined by :

 i=1,2,3,…..N where yibar is the mean per element of the ith cluster


i=1,2,…..n

Similarly the unbiased estimator of Var(ybar) can be given by

 

Real-Life Application:

Cluster Sampling is one concept that is used in our daily life. For example, in the case of cooking, to check if all the vegetables are cooked in a curry, we randomly would select a few and check rather than considering the whole vegetable present in it.

Cluster sampling is typically used in market research. For instance, it Is used when a researcher can’t get information about the population as a whole, but then they can obtain information about the clusters. For instance, suppose a researcher may be interested in data about City taxes of a particular State (suppose Karnataka) The researcher would obtain data from selected cities and compile them to get a picture about the state Condition. Here, the individual cities would be considered as clusters. 

Now, let’s illustrate one stage/Single Stage Sampling in R:

About the Dataset:

The Academic Performance Index is computed for all California schools based on standardized testing of students. The data sets contain information for all schools with at least 100 students and for various probability samples of the data.

ANALYSIS:











Interpretation from R:

This is an In-built Dataset that contains Variables according to which we can divide our whole population into clusters. Here, our variable dnum is through which we have divided the population into clusters. We have performed a one-stage cluster sampling and have obtained mean and standard Error for our variable of Interest(Enroll)549.72 and 45.191 The missing values can also be obtained easily through this method. Here a cluster of size 15 has been drawn.

 Advantages of Cluster Sampling:

When compared to Simple Random Sampling and Stratified Sampling, cluster sampling has few advantages over them:

1.Cluster Sampling requires fewer resources:

Since only a few groups from the entire population is taken into consideration, it requires  fewer resource for sampling, which makes it cheaper relative to another sampling.

2.More feasible

the feasibility of the sampling is increased because of the division of the entire population into groups that are Homogeneous.

Disadvantages of Cluster Sampling:

1.Biased Sampling:

If the clusters we choose are taken considering biased opinion. The inferences of the whole population would also be biased.

2.Sampling Error Is High:

Samples that are drawn using the cluster method have higher sampling error comparatively to other samples drawn using other sampling methods.

 Conclusion:

To use Cluster Sampling to our data we have to make sure that our dataset's nature is compatible, that is if the research requires completely independent respondents. This technique might not be the best one. Cluster Sampling is advantageous when we don't have enough resources to develop a good Sampling Frame. Observing all the Cluster Sampling properties, we can say it is one of the efficient and more time-saving methods compared to other probability Sampling Methods.

 

 

 

 

Comments

Popular posts from this blog

Comparing the efficiency of SRSWOR and SRSWR with the help of R Programming

Selection of samples:SRSWR vs SRSWOR(2048114)

pps (probability proportional to size) Systematic Sampling