Probability Proportional to Size with Replacement

                                                                                                                Vishaly B

                                                                                                                  2048143

What is PPS sampling?

Probability Proportional to size sampling is a method of sampling from a finite population in which a size measure is available for each population unit before sampling and where the probability of selecting unit is proportional to size.

Example and Explanation:

If Y is the variable under study and X is an auxiliary variable related to Y then by this scheme the units are selected with probability proportional to the value X, called as size. This is termed as probability proportion to a given measure of size (PPS) sampling.

For example: In an agriculture survey, the yield depends on the area under cultivation. So here the size is the value of auxiliary variable X and the study variable Y is yield. So bigger areas likely to have a larger population, and they will contribute more towards the population total so the value of the area can be considered as the size of the auxiliary variable. Also, the cultivated area for a previous period can also be taken as the size while estimating the yield of the crop.

Similarly, in an industrial survey, the number of workers in a factory can be considered as the measure of size when studying the industrial output from the respective factory. Since a large unit, that is, a unit with a large value for the study variable Y, contributes more to the population total then smaller units, it is natural to expect that a scheme of selection which gives more chance of inclusion in a sample to larger units than to smaller units would provide estimators more efficient than equal probability sampling. Such a scheme is provide PPS sampling, size being the value of an auxiliary variable X directly related to Y

When we can use PPS sampling?

In simple random sampling the selection probabilities were equal for all units of the population. Whenever the units vary in size, simple random sampling is not an appropriate procedure as no importance is given to the size of the unit. Such auxiliary information about size of the units can be utilized in selecting the sample so as to get more efficient estimators of the population parameter. One such method is to assign unequal probabilities of selection to different units in the population depending on their sizes. PPS scheme take the use of information of auxiliary variate. This scheme is applicable only if in case of data on auxiliary variate for individual sampling units are available.

For example: With PPS sampling, size is the crucial element. We implement this technique when the size of the variables is inconsistent. If we survey the entire population of one company, then the population of each department may not be applicable this sampling would not work within the structure of a PPS survey because the size of each section does not matter. If we were to survey a  company regarding a subject that affects each department, such as the number of break rooms to invest in each section, the population of the department becomes a key factor when we must take size of different sections of the population into account, PPS is probably the way to create sampling.

 

 PPS Sampling with Replacement:

The probability of selection of a unit will not change, and the probability of selecting a specified unit is the same at any stage. There is no redistribution of the probabilities after a draw. PPS without replacement (WOR) is more complex than PPS with replacement (WR). Researchers always expect efficient estimator in that case PPSWOR provide a more efficient estimator than PPSWR. A lot of work in the field of sampling with varying probabilities, WOR , has been done, but most of the procedures are complex and not easily applicable in large-scale survey. If the sampling fraction that is n/N (n sample size and N population size) is small, in large-scale surveys, the efficiencies of sampling with or without replacement will differ insignificantly. If the sampling fraction is larger or moderately large then the gain in efficiency due to PPS sampling, WOR will be substantial.

The procedure of selecting a sample consists in associating with each unit a number or set of numbers equal to its size.

There are two methods to draw a sample with PPSWR

1.     Cumulative total method

2.     Lahiri’s method

Cumulative Total Method:

The procedure of selecting a simple random sample size n consist of associating the natural number from 1 to N units in the population and then selecting those n units whose serial numbers corresponds to a set of n numbers where each number is <= N which is drawn from a random number table. In the selection of a sample with varying probabilities, the procedure is to associate with each unit a set of consecutive natural numbers, the size of the set being proportional to the desired probability. If X1,X 2 , ,...,N    are the positive integers proportional to the probabilities assigned to the N units in the population, then a possible way to associate the cumulative totals of the units. Then the units are selected based on the values of cumulative totals. Drawback is this procedure involves writing down the successive cumulative totals. This is time consuming and tedious if the number of units in population is large.

Lahiri’s Method:

Let M=Max Xi where i= 1 to N some convenient number greater M. Steps involve select a pair of the random number (i , j) such that 1<= i<= N, 1<=j<=M. If j<= Xi , then ith unit is selected otherwise rejected and another pair of random number is chosen. To get a sample of size n, this procedure is repeated till n units are selected.

Advantage:

1.     It does not require writing down all cumulative totals for each unit.

2.     Sizes of all the units need not be known beforehand. We need only some number greater than the maximum size and the sizes of those units which are selected by the choice of the first set of random numbers 1 to N for drawing sample under this scheme.

Disadvantage:

It results in the wastage of time and efforts if units rejected. A draw is ineffective if one of the ineffective random number is selected.

 

Estimation in PPS sampling with Replacement:

Formulas

yi   - value of the characteristic under study for the unit ui  of the population (i=1,2,….,N)

pi = Xi  / X be the probability that the unit ui is selected in a sample                                                                    

n is sample size

In ppswr an unbiased estimator of the population total Y is given by


An unbiased estimator of the population mean, is give by



In ppswr an unbiased estimator of   is

 



An unbiased estimator of   is given by



 

Applications

1.     PPS sampling is applied Auditing that is objective of using probability proportional to size sampling to test account balances. PPS tests the reasonableness of a recorded account balance or class of transactions. PPS is used to determine the accuracy of financial accounts that is to test for overstatements. The process of using PPS in testing this company’s accounts balances involves the following steps.

To Determine the objective of the test- Probability Proportional to Size tests the reasonableness of a recorded account balance. In this situation we are going to use PPS in testing account balances and transactions of overstatements. The population is the account balance of the company being tested. The auditor must assure him/herself if that the physical representation of the population being tested includes the entire population. PPS automatically includes in the sample any unit that is individually significant. Here we have selected PPS for auditing sample technique.

2.     Industries:

To find the production of different industrial companies that is when we want to find the production for different companies, we will use PPS sampling technique because the size of each company differs having this auxiliary variable, we can estimate the variable of interest that is total production.

3.      Agriculture:

 In an agriculture survey, the yield depends on the area under cultivation. So here the size is the value of auxiliary variable X and the study variable Y is yield. For example, if we are having 100 farms with the area under crop and then we want to find the average yield per farm and population total we can use this technique to get accurate result.

 

R code with Example

About the dataset:

A pilot scheme for the study of cultivation practices and yield of guava was carried out in Allahabad district of Uttar Pradesh during 1980-81. The number of guava trees and area reported under guava trees, in each of 30 villages growing guavas in one of the tehsils of Allahabad district is given.

Import dataset

plot for variable of study number of guava trees and auxiliary variable acres


From the plot we infer that area under guavas and number of trees are positively correlated

To draw a sample by ppswr Technique






Here we got the variance of population total of number of guava trees using ppswr is $270935049 with standard error is $16460.1 which is the deviation from the estimated value of population total number of guava trees

Conclusion

By using ppswr() function under sampling book package we have selected  a sample of size 10 from population and we determined average number of guavas trees 1815.751 with variance is 26917.7. The estimated population total that is estimated total number of guava trees is 54472.52 with the variance of population total of number of guava trees using ppswr is $270935049 with standard error is $16460.1 which is the deviation from the estimated value of population total number of guava trees.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Comments

Popular posts from this blog

Comparing the efficiency of SRSWOR and SRSWR with the help of R Programming

Selection of samples:SRSWR vs SRSWOR(2048114)

pps (probability proportional to size) Systematic Sampling