PROBABILITY PROPORTIONAL TO SIZE (PPS)
PROBABILITY PROPORTIONAL TO SIZE
Done by: Sindhu .D
2048135
The simple random sampling scheme provides a random sample where every unit in the population has an equal probability of selection. Under certain circumstances, more efficient estimators are obtained by assigning unequal probabilities of selection to the units in the population. This type of sampling is known as varying probability sampling scheme.
If Y is the variable under study and X is an auxiliary variable related to Y, then in the most commonly
used varying probability scheme, the units are selected with probability proportional to the value of X,
called as size. This is termed as probability proportional to a given measure of size (pps) sampling.
If the sampling units vary considerably in size, then SRS does not takes into account the possible
importance of the larger units in the population. A large unit, i.e., a unit with a large value of Y
contributes more to the population total than the units with smaller values, so it is natural to expect that a
selection scheme which assigns more probability of inclusion in a sample to the larger units than to the smaller
units would provide more efficient estimators than the estimators which provide equal probability to all the units.
This is accomplished through pps sampling.
Note that the “size” considered is the value of auxiliary variable X and not the value of study variable Y.
For example, in an agriculture survey, the yield depends on the area under cultivation.
So bigger areas are likely to have a larger population, and they will contribute more towards
the population total, so the value of the area can be considered as the size of the auxiliary variable.
Also, the cultivated area for a previous period can also be taken as the size while estimating
the yield of the crop. Similarly, in an industrial survey, the number of workers in a factory can be
considered as the measure of size when studying the industrial output from the respective factory.
➡️ There are 2 different ways of selecting the sample .i.e ppswr and ppswor
PPS sampling with replacement (WR):
First, we discuss the two methods to draw a sample with PPS and WR.
1. Cumulative total method:
The procedure of selecting a simple random sample of size n consists of
- associating the natural numbers from 1 to N units in the population and
- then selecting those n units whose serial numbers correspond to a set of n numbers where each
number is less than or equal to N which is drawn from a random number table.
In the selection of a sample with varying probabilities, the procedure is to associate with
each unit a set of consecutive natural numbers, the size of the set being proportional to the
desired probability.
If 1 2 , ,..., X X X N are the positive integers proportional to the probabilities assigned to the N
units in the population, then a possible way to associate the cumulative totals of the units. Then the units are selected based on the values of cumulative totals. This is illustrated in the following table:
In this case, the probability of selection of ith unit is
TT X PT T
− − = =
ii i
1
i
N N
.
⇒ ∝
P X
i i
Note that TN is the population total which remains constant.
Drawback: This procedure involves writing down the successive cumulative totals.
This is time consuming and tedious if the number of units in the population is large.
This problem is overcome in Lahiri’s method.
Lahiri’s method:
Let
= i.e., maximum of the sizes of N units in the population or some convenient number
M Max X
,i
=
1,2,...,
i N
greater than M .
The sampling procedure has the following steps:
1. Select a pair of the random number (i, j) such that 1 ,1 . ≤ iN jM ≤ ≤≤
2. If ,i j ≤ X then ith unit is selected otherwise rejected and another pair
of random number is chosen.
3. To get a sample of size n , this procedure is repeated till n units are selected.
Now we see how this method ensures that the probabilities of selection of units are
varying and are proportional to size.
Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur Page 3
Probability of selection of ith unit at a trial depends on two possible outcomes
– either it is selected at the first draw
– or it is selected in the subsequent draws preceded by ineffective draws. Such probability is given by
≤≤ ≤ ≤
P i NP j M i
(1 ) (1 | )
X P
1. , say. ii
*
= = N M
X
1 Probability that no unit is selected at a trial 1 N
⎛ ⎞ = − ⎜ ⎟ ⎝ ⎠
∑
i
N M
i
=
1
⎛ ⎞ = − ⎜ ⎟ ⎝ ⎠
1
NX N
N M
X Q M
=− =
1 , say.
The probability that unit i is selected
P QP Q P
=+ + + ii i
...
P
= −
i
1
*
Q
X NM X X X / . /
= == ∝ i ii
X M NX X
i
total
Thus the probability of selection of unit i is proportional to the size Xi . So this method generates a pps sample.
Advantage:
1. It does not require writing down all cumulative totals for each unit.
2. Sizes of all the units need not be known beforehand. We need only some number greater than the maximum size and the sizes of those units which are selected by the choice of the first set of random numbers 1 to N for drawing sample under this scheme.
Disadvantage: It results in the wastage of time and efforts if units get rejected. A draw is ineffective if one of the ineffective random numbers is selected.
❋R CODE
A sample survey was conducted to study the yield of wheat in haryana.A sample of 20 farms from a total of 100 was taken, with probability proportional to area under wheat crop with replacement method.The total area under the wheat crop was 484.5 hectares.The area under the crop (X) and yield (Y) were noted in hectares and quantiles per hectare respectively.
i)Estimate the average yield per farm along with its standard error for the given sample.
Hence the average yield per farm is 29.21369.
CONCLUSION: Hence for the given data the average yield per farm is 29.21369 with the standard error as 1.392459
Comments
Post a Comment