Probability
Proportional to Size with Replacement
Vishaly B
2048143
What is PPS
sampling?
Probability Proportional to size sampling is a method
of sampling from a finite population in which a size measure is available for each
population unit before sampling and where the probability of selecting unit is proportional
to size.
Example and
Explanation:
If Y is the variable under study and X is an auxiliary
variable related to Y then by this scheme the units are selected with
probability proportional to the value X, called as size. This is termed as
probability proportion to a given measure of size (PPS) sampling.
For example: In an agriculture survey, the yield depends
on the area under cultivation. So here the size is the value of auxiliary
variable X and the study variable Y is yield. So bigger areas likely to have a
larger population, and they will contribute more towards the population total
so the value of the area can be considered as the size of the auxiliary variable.
Also, the cultivated area for a previous period can also be taken as the size
while estimating the yield of the crop.
Similarly, in an industrial survey, the number of
workers in a factory can be considered as the measure of size when studying the
industrial output from the respective factory. Since a large unit, that is, a
unit with a large value for the study variable Y, contributes more to the
population total then smaller units, it is natural to expect that a scheme of
selection which gives more chance of inclusion in a sample to larger units than
to smaller units would provide estimators more efficient than equal probability
sampling. Such a scheme is provide PPS sampling, size being the value of an
auxiliary variable X directly related to Y
When we can use
PPS sampling?
In simple random sampling the selection probabilities
were equal for all units of the population. Whenever the units vary in size,
simple random sampling is not an appropriate procedure as no importance is
given to the size of the unit. Such auxiliary information about size of the
units can be utilized in selecting the sample so as to get more efficient
estimators of the population parameter. One such method is to assign unequal
probabilities of selection to different units in the population depending on
their sizes. PPS scheme take the use of information of auxiliary variate. This
scheme is applicable only if in case of data on auxiliary variate for
individual sampling units are available.
For example: With PPS sampling, size is the crucial
element. We implement this technique when the size of the variables is inconsistent.
If we survey the entire population of one company, then the population of each
department may not be applicable this sampling would not work within the
structure of a PPS survey because the size of each section does not matter. If
we were to survey a company regarding a
subject that affects each department, such as the number of break rooms to
invest in each section, the population of the department becomes a key factor
when we must take size of different sections of the population into account, PPS
is probably the way to create sampling.
PPS Sampling with Replacement:
The probability of selection of a unit will not change,
and the probability of selecting a specified unit is the same at any stage.
There is no redistribution of the probabilities after a draw. PPS without
replacement (WOR) is more complex than PPS with replacement (WR). Researchers always expect efficient estimator in that case PPSWOR provide a more efficient estimator than PPSWR. A lot of work in the field of sampling with varying probabilities, WOR , has been done, but most of the procedures are complex and not easily applicable in large-scale survey. If the sampling fraction that is n/N (n sample size and N population size) is small, in large-scale surveys, the efficiencies of sampling with or without replacement will differ insignificantly. If the sampling fraction is larger or moderately large then the gain in efficiency due to PPS sampling, WOR will be substantial.
The procedure of selecting a sample consists in
associating with each unit a number or set of numbers equal to its size.
There are two methods to draw a sample with PPSWR
1. Cumulative
total method
2. Lahiri’s
method
Cumulative
Total Method:
The procedure of selecting a simple random sample size
n consist of associating the natural number from 1 to N units in the population
and then selecting those n units whose serial numbers corresponds to a set of n
numbers where each number is <= N which is drawn from a random number table.
In the selection of a sample with varying probabilities, the procedure is to
associate with each unit a set of consecutive natural numbers, the size of the
set being proportional to the desired probability. If X1,X 2 ,
,..., X N are the
positive integers proportional to the probabilities assigned to the N units in
the population, then a possible way to associate the cumulative totals of the
units. Then the units are selected based on the values of cumulative totals.
Drawback is this procedure involves writing down the successive cumulative totals.
This is time consuming and tedious if the number of units in population is
large.
Lahiri’s
Method:
Let M=Max Xi where i= 1 to N some convenient
number greater M. Steps involve select a pair of the random number (i , j) such
that 1<= i<= N, 1<=j<=M. If j<= Xi , then ith unit is
selected otherwise rejected and another pair of random number is chosen. To get
a sample of size n, this procedure is repeated till n units are selected.
Advantage:
1. It
does not require writing down all cumulative totals for each unit.
2. Sizes
of all the units need not be known beforehand. We need only some number greater
than the maximum size and the sizes of those units which are selected by the choice
of the first set of random numbers 1 to N for drawing sample under this scheme.
Disadvantage:
It results in the wastage of time and efforts if units
rejected. A draw is ineffective if one of the ineffective random number is selected.
Estimation in
PPS sampling with Replacement:
Formulas
yi - value of the characteristic
under study for the unit ui of the population (i=1,2,….,N)
pi = Xi / X be the probability that the unit ui is
selected in a sample
n is sample size
In ppswr an unbiased estimator of the population total
Y is given by
![]() |
In ppswr an unbiased estimator of is
An unbiased estimator of is given by
Applications
1. PPS
sampling is applied Auditing that is objective of using probability proportional
to size sampling to test account balances. PPS tests the reasonableness of a
recorded account balance or class of transactions. PPS is used to determine the
accuracy of financial accounts that is to test for overstatements. The
process of using PPS in testing this company’s accounts balances involves the
following steps.
To Determine
the objective of the test- Probability Proportional to Size tests the
reasonableness of a recorded account balance. In this situation we are going to
use PPS in testing account balances and transactions of overstatements. The
population is the account balance of the company being tested. The
auditor must assure him/herself if that the physical representation of the
population being tested includes the entire population. PPS automatically
includes in the sample any unit that is individually significant. Here we have
selected PPS for auditing sample technique.
2. Industries:
To find the production of different industrial companies that is when we
want to find the production for different companies, we will use PPS sampling
technique because the size of each company differs having this auxiliary variable,
we can estimate the variable of interest that is total production.
3. Agriculture:
R
code with Example
About the dataset:
A pilot scheme for the study of
cultivation practices and yield of guava was carried out in Allahabad district
of Uttar Pradesh during 1980-81. The number of guava trees and area reported
under guava trees, in each of 30 villages growing guavas in one of the tehsils
of Allahabad district is given.
Import dataset
plot for variable of study number of guava trees and auxiliary variable acres
From the plot we infer that area under guavas and number of trees are positively correlated
To draw a
sample by ppswr Technique
Here we got the variance of population
total of number of guava trees using ppswr is $270935049 with standard error is
$16460.1 which is the deviation from the estimated value of population total
number of guava trees
Conclusion
By using ppswr() function under sampling book package
we have selected a sample of size 10
from population and we determined average number of guavas trees 1815.751 with variance
is 26917.7. The estimated population total that is estimated total number of guava
trees is 54472.52 with the variance of population total of number of guava
trees using ppswr is $270935049 with standard error is $16460.1 which is the
deviation from the estimated value of population total number of guava trees.
Comments
Post a Comment