SIMPLE RANDOM SAMPLING WITHOUT REPLACEMENT
Done by: Sahasra Subhash 2048131
GENERAL INTRODUCTION:
The topic of discussion in this article is one of the most
commonly-known sampling technique, the Simple Random Sampling without
replacement technique, more commonly known as the SRSWOR technique.
So, what is a
sampling technique? Before, we understand what sampling techniques are, we need
to understand what is meant by population and sample in statistics.
Statistically speaking, we have the population as the entire group under
study. The population is generally denoted by “N” in statistical studies. A sample on the other hand is a specific
group or subsection that has been collected from the population. A sample is
always less than the population and reflects the properties of the population.
A sample is generally denoted by “n” in statistical studies.
Having knowledge about a sample makes it quite easy
to understand what might be a sampling technique. A sampling technique is a technique or method used to collect samples
from the population. Simple Random Sampling is the method of selecting samples
from the population in a completely random method. There are two main types
under Simple Random Sampling technique which are:
· Simple
Random Sampling With Replacement(SRSWR)
· Simple
Random Sampling Without Replacement(SRSWOR)
We concentrate mainly on the Simple Random Sampling
Without Replacement(SRSWOR) method. In this method, a simple random sample is
chosen from a population without replacement. This can be understood more
clearly from the diagram:
Since the selection is done without replacement,
each time a sample is drawn, the total number of units in the population
decreases by one, and hence the probability is different in each draw.
We are selecting a sample of size n from a
population of size N. All units have an equal chance of being selected in the
sample. That is, in the first draw, the probability of selection is 1/N. In the
second selection, the number of units in the population is decreased by one =
N-1. Hence, the probability of selection of a unit is 1/N-1. The same continues in
each and every draw.
Some of the commonly used notations in SRSWOR are:
1. N:
Number of sampling units in the population, the population size.
2. n:
Number of sampling units in the sample, the sample size.
3. Y:
The characteristic under consideration.
4. Y
i: Value of the characteristic for the ith unit of the
population.
· Mean
in statistics is just another name for average which can be computed as sum of
all observations divided by the number of observations.
5. The sample mean is given by the formula:
6. The population mean is given by the formula:
7.
8.
9.
· The
variance gives the spread of the data values. It shows how far each number in
the set is from the mean value.
10. The
variance is given by the formula: (N-n/N*n)*S2
· Confidence
interval in statistics refers to the margin of error. In other words, it gives
the probability that the value of a parameter will fall between a set of
values, it gives an interval in which the value of the parameter should lie.
The confidence interval is given by the formula:
C.I = mean + - z α/2 *
Standard error where α= 0.05 as we are considering a 95% confidence interval
here.
Note: when N is less than 30, we
use t α/2 in place of z α/2 for the purpose of
computations.
ADVANTAGES
OF SRSWOR:
· Since
the units that make up the sample are chosen at random from the population,
each unit has an equal chance of being selected and hence this results in a
balanced subset being drawn.
· Simplicity:
Since the units are selected in a random order, the process is very simple.
DISADVANTAGES
OF SRSWOR:
· This
method can be time-consuming when a full list of a larger population is not
available.
· The
cost of collection of information regarding the entire population is high.
· There
might be possible errors or bias in the process of the selection of samples.
APPLICATIONS
OF SRSWOR:
The Simple Random Sampling without replacement
technique finds applications in almost all domains in real life. A person
cooking rice randomly takes some grains of rice and checks if it is cooked.
This is an application of the method. Here, the few grains of rice that were
chosen are assumed to reflect the characteristic of the entire population. Similarly,
SRSWOR finds application in almost all the cases where a small sector of the
population is selected without replacement and is assumed to reflect the
characteristics of the entire population. Some of the applications from
specific fields of study have been explained in detail as given below:
NOTE:
For the purpose of better and easy understanding, the population sizes for the
purpose of the demonstration are kept very small, less than 30 here. In real life,
there may be situations where the population under consideration may be huge.
Hence, for computations of the confidence interval, the t statistic is used in
place of the z statistic.
(1) In the biostatistics field:
Consider the situation where a pioneer company for
producing vaccines claims that a vaccine is 96% effective on people of all
age groups. Hence, the company is advised to select a sample of 3 people from a
population of 5 people from different age groups. Here, we can use simple
random sampling without replacement technique to select the people for the
clinical trial in a random fashion.
Let the five people named A, B, C, D, and E be the
five people aged 16, 27, 39, 59, and 48 respectively. Let yi (i = 1, 2,
3, 4, 5) denote the effectiveness of the vaccine on the five people during the
clinical trial.
Here, we will have a total of 5C2 ways to select the
samples, that is 10 ways. The probability of selection of all people at the
first draw is equal to 1/5. After the selection of one person, there are 4 people
left. Hence, the probability in the second draw is equal to ¼.
Further computations such as the mean, variance, and
the confidence interval for the efficiency of the drug can be computed from the
respective formulas. The R code for the computations are given below:
Here, the vector effectiveness specifies the
effectiveness of the vaccine in terms of percentages.
From the computations, it is clear that the
estimated mean for the selected sample is 95.33 which falsifies the claim of the
company that the vaccine is 97% effective. This estimated value may differ
based on the different samples being drawn. Hence, the general interval in
which the values lie is provided by the confidence interval. Here, 95%
confidence interval is computed to be [82.27, 108.4] which suggests that there
are 95% chances that the effectiveness of the vaccine lies in this interval.
Since our focus is on the effectiveness of the drug and it can not cross 100%,
we can conclude that the efficiency of the vaccine lies in the interval [82.27,
100] in 95% of the cases which can be considered to be a good amount.
(2) In the agricultural field:
One of the major issues faced in the agricultural field is regarding the selection of the plot for the purpose of the growth of new
crops. The farmer or the owner of the land has to make sure that they obtain the maximum yield of the crops and the selection of appropriate plots of land is a
major contributing factor.
Consider the case where a farmer is interested in
buying two plots of land and there are 4 plots of land available. The only
condition in the farmers mind is that both the plots should have an average
area of a minimum of 25 square feet. The four plots of land are named as plot1,
plot2, plot3, and plot4 having areas 24, 26, 24, and 27 square feet respectively.
The farmer selects two farms at random from the four plots. The computations in
R for the same have been given below:
The condition put forth by the farmer regarding the
plot of land was that it should have a size of a minimum of 25 square feet. Here,
the mean size of the two plots chosen as the sample has been found to be equal to
26.5 square feet which meet the requirement put forward by the farmer. Also,
the 95% confidence interval is found to be [24.45, 28.56]. Hence, there were
95% chances that the sizes of the selected plots would lie in this interval.
3.
In quality control:
Consider the case where a factory produces chip
packets. The company claims that all the chips packets have a weight equal to 40
grams. The weight of the packets is considered one of the main aspects to
determine the quality of the chips. In order to test whether the claim given by
the company is correct, the investigation team randomly selects 4 chips packets
from a lot of 10 packets. The chips packets are said to maintain the pre-specified
quality standards if the mean weight of the selected packets lies between 38
and 42 grams where 2 is the standard error.
Let the 10 chips packets in the lot, taken as the
population be denoted by Pi where i= 1 to 10, and their respective weights are
found to be equal to 40, 43, 39, 37, 46, 33, 38, 39, 34, and 39 respectively. The
R codes for the computations, in this case, are as given below:
The company claims that all the chip packets are
produced based on some pre-specified standards and that all packets have sizes
equal to 40 grams. Here, the mean of the selected sample of size 4 from the
population has a mean equal to 40 grams which is exactly what the company
standards specify. Therefore, conclude that the company strictly follows the standards
that were set. Also, the 95% confidence interval was computed to be [36.7,
43.4] respectively. Hence, in 95% of the cases, the weight of the chip packets
are said to be in this interval.
The applications of SRSWOR in the field of biostatistics, quality control and agriculture have been understood both theoretically and through R codes. Three different applications of Simple Random Sampling without replacement have been discussed here and proper interpretations have been obtained for all three cases separately. The applications do not limit to the ones that have been discussed here. Apart from these, there are many more applications of Simple Random Sampling Without Replacement.
Comments
Post a Comment