SIMPLE RANDOM SAMPLING WITHOUT REPLACEMENT

Done by: Sahasra Subhash 2048131

GENERAL INTRODUCTION:

The topic of discussion in this article is one of the most commonly-known sampling technique, the Simple Random Sampling without replacement technique, more commonly known as the SRSWOR technique.

So, what is a sampling technique? Before, we understand what sampling techniques are, we need to understand what is meant by population and sample in statistics.

Statistically speaking, we have the population as the entire group under study. The population is generally denoted by “N” in statistical studies. A sample on the other hand is a specific group or subsection that has been collected from the population. A sample is always less than the population and reflects the properties of the population. A sample is generally denoted by “n” in statistical studies.

Having knowledge about a sample makes it quite easy to understand what might be a sampling technique. A sampling technique is a technique or method used to collect samples from the population. Simple Random Sampling is the method of selecting samples from the population in a completely random method. There are two main types under Simple Random Sampling technique which are:

· Simple Random Sampling With Replacement(SRSWR)

· Simple Random Sampling Without Replacement(SRSWOR)

We concentrate mainly on the Simple Random Sampling Without Replacement(SRSWOR) method. In this method, a simple random sample is chosen from a population without replacement. This can be understood more clearly from the diagram:

Since the selection is done without replacement, each time a sample is drawn, the total number of units in the population decreases by one, and hence the probability is different in each draw.

We are selecting a sample of size n from a population of size N. All units have an equal chance of being selected in the sample. That is, in the first draw, the probability of selection is 1/N. In the second selection, the number of units in the population is decreased by one = N-1. Hence, the probability of selection of a unit is 1/N-1. The same continues in each and every draw.

Some of the commonly used notations in SRSWOR are:

1. N: Number of sampling units in the population, the population size.

2. n: Number of sampling units in the sample, the sample size.

3. Y: The characteristic under consideration.

4. Y_i: Value of the characteristic for the i^th unit of the population.

· Mean in statistics is just another name for average which can be computed as sum of all observations divided by the number of observations.

5. The sample mean is given by the formula:

6. The population mean is given by the formula:

· The variance gives the spread of the data values. It shows how far each number in the set is from the mean value.

10. The variance is given by the formula: (N-n/N*n)*S²

· Confidence interval in statistics refers to the margin of error. In other words, it gives the probability that the value of a parameter will fall between a set of values, it gives an interval in which the value of the parameter should lie. The confidence interval is given by the formula:

C.I = mean + - z _α/2 * Standard error where α= 0.05 as we are considering a 95% confidence interval here.

Note: when N is less than 30, we use t_α/2in place of z _α/2for the purpose of computations.

ADVANTAGES OF SRSWOR:

· Since the units that make up the sample are chosen at random from the population, each unit has an equal chance of being selected and hence this results in a balanced subset being drawn.

· Simplicity: Since the units are selected in a random order, the process is very simple.

DISADVANTAGES OF SRSWOR:

· This method can be time-consuming when a full list of a larger population is not available.

· The cost of collection of information regarding the entire population is high.

· There might be possible errors or bias in the process of the selection of samples.

APPLICATIONS OF SRSWOR:

The Simple Random Sampling without replacement technique finds applications in almost all domains in real life. A person cooking rice randomly takes some grains of rice and checks if it is cooked. This is an application of the method. Here, the few grains of rice that were chosen are assumed to reflect the characteristic of the entire population. Similarly, SRSWOR finds application in almost all the cases where a small sector of the population is selected without replacement and is assumed to reflect the characteristics of the entire population. Some of the applications from specific fields of study have been explained in detail as given below:

NOTE: For the purpose of better and easy understanding, the population sizes for the purpose of the demonstration are kept very small, less than 30 here. In real life, there may be situations where the population under consideration may be huge. Hence, for computations of the confidence interval, the t statistic is used in place of the z statistic.

(1) In the biostatistics field:

Consider the situation where a pioneer company for producing vaccines claims that a vaccine is 96% effective on people of all age groups. Hence, the company is advised to select a sample of 3 people from a population of 5 people from different age groups. Here, we can use simple random sampling without replacement technique to select the people for the clinical trial in a random fashion.

Let the five people named A, B, C, D, and E be the five people aged 16, 27, 39, 59, and 48 respectively. Let y_i(i = 1, 2, 3, 4, 5) denote the effectiveness of the vaccine on the five people during the clinical trial.

Here, we will have a total of 5C2 ways to select the samples, that is 10 ways. The probability of selection of all people at the first draw is equal to 1/5. After the selection of one person, there are 4 people left. Hence, the probability in the second draw is equal to ¼.

Further computations such as the mean, variance, and the confidence interval for the efficiency of the drug can be computed from the respective formulas. The R code for the computations are given below:

Here, the vector effectiveness specifies the effectiveness of the vaccine in terms of percentages.

From the computations, it is clear that the estimated mean for the selected sample is 95.33 which falsifies the claim of the company that the vaccine is 97% effective. This estimated value may differ based on the different samples being drawn. Hence, the general interval in which the values lie is provided by the confidence interval. Here, 95% confidence interval is computed to be [82.27, 108.4] which suggests that there are 95% chances that the effectiveness of the vaccine lies in this interval. Since our focus is on the effectiveness of the drug and it can not cross 100%, we can conclude that the efficiency of the vaccine lies in the interval [82.27, 100] in 95% of the cases which can be considered to be a good amount.

(2) In the agricultural field:

One of the major issues faced in the agricultural field is regarding the selection of the plot for the purpose of the growth of new crops. The farmer or the owner of the land has to make sure that they obtain the maximum yield of the crops and the selection of appropriate plots of land is a major contributing factor.

Consider the case where a farmer is interested in buying two plots of land and there are 4 plots of land available. The only condition in the farmers mind is that both the plots should have an average area of a minimum of 25 square feet. The four plots of land are named as plot1, plot2, plot3, and plot4 having areas 24, 26, 24, and 27 square feet respectively. The farmer selects two farms at random from the four plots. The computations in R for the same have been given below:

The condition put forth by the farmer regarding the plot of land was that it should have a size of a minimum of 25 square feet. Here, the mean size of the two plots chosen as the sample has been found to be equal to 26.5 square feet which meet the requirement put forward by the farmer. Also, the 95% confidence interval is found to be [24.45, 28.56]. Hence, there were 95% chances that the sizes of the selected plots would lie in this interval.

3. In quality control:

Consider the case where a factory produces chip packets. The company claims that all the chips packets have a weight equal to 40 grams. The weight of the packets is considered one of the main aspects to determine the quality of the chips. In order to test whether the claim given by the company is correct, the investigation team randomly selects 4 chips packets from a lot of 10 packets. The chips packets are said to maintain the pre-specified quality standards if the mean weight of the selected packets lies between 38 and 42 grams where 2 is the standard error.

Let the 10 chips packets in the lot, taken as the population be denoted by Pi where i= 1 to 10, and their respective weights are found to be equal to 40, 43, 39, 37, 46, 33, 38, 39, 34, and 39 respectively. The R codes for the computations, in this case, are as given below:

The company claims that all the chip packets are produced based on some pre-specified standards and that all packets have sizes equal to 40 grams. Here, the mean of the selected sample of size 4 from the population has a mean equal to 40 grams which is exactly what the company standards specify. Therefore, conclude that the company strictly follows the standards that were set. Also, the 95% confidence interval was computed to be [36.7, 43.4] respectively. Hence, in 95% of the cases, the weight of the chip packets are said to be in this interval.

CONCLUSION:

The applications of SRSWOR in the field of biostatistics, quality control and agriculture have been understood both theoretically and through R codes. Three different applications of Simple Random Sampling without replacement have been discussed here and proper interpretations have been obtained for all three cases separately. The applications do not limit to the ones that have been discussed here. Apart from these, there are many more applications of Simple Random Sampling Without Replacement.

Search This Blog

Sampling Techniques

SIMPLE RANDOM SAMPLING WITHOUT REPLACEMENT

Comments

Post a Comment

Popular posts from this blog

Stratified Sampling - Neyman Allocation

Comparing the efficiency of SRSWOR and SRSWR with the help of R Programming

pps (probability proportional to size) Systematic Sampling