PROBABILITY PROPORTIONAL TO SIZE (PPS)

  PROBABILITY PROPORTIONAL TO SIZE 

                                                                                            

                                                                                                                       Done by:  Sindhu .D

                                                                                                                                        2048135


The simple random sampling scheme provides a random sample where every unit in the population has  an equal probability of selection. Under certain circumstances, more efficient estimators are obtained by  assigning unequal probabilities of selection to the units in the population. This type of sampling is known  as varying probability sampling scheme.  

If Y is the variable under study and X is an auxiliary variable related to Y, then in the most commonly 

 used varying probability scheme, the units are selected with probability proportional to the value of X

 called as size. This is termed as probability proportional to a given measure of size (pps) sampling. 

If the  sampling units vary considerably in size, then SRS does not takes into account the possible 

importance of  the larger units in the population. A large unit, i.e., a unit with a large value of Y

 contributes more to the  population total than the units with smaller values, so it is natural to expect that a

 selection scheme which  assigns more probability of inclusion in a sample to the larger units than to the smaller 

units would  provide more efficient estimators than the estimators which provide equal probability to all the units. 

This  is accomplished through pps sampling.  

Note that the “size” considered is the value of auxiliary variable X and not the value of study variable Y.

  For example, in an agriculture survey, the yield depends on the area under cultivation. 

So bigger areas are  likely to have a larger population, and they will contribute more towards 

the population total, so the value  of the area can be considered as the size of the auxiliary variable.

 Also, the cultivated area for a previous  period can also be taken as the size while estimating 

the yield of the crop. Similarly, in an industrial  survey, the number of workers in a factory can be 

considered as the measure of size when studying the  industrial output from the respective factory.  

 ➡️ There are 2 different ways of selecting the sample .i.e ppswr and ppswor

 

PPS sampling with replacement (WR):  

First, we discuss the two methods to draw a sample with PPS and WR.  

1. Cumulative total method:  

The procedure of selecting a simple random sample of size n consists of  

- associating the natural numbers from 1 to N units in the population and  

- then selecting those n units whose serial numbers correspond to a set of n numbers where each 

 number is less than or equal to N which is drawn from a random number table.  

In the selection of a sample with varying probabilities, the procedure is to associate with 

each unit a set of  consecutive natural numbers, the size of the set being proportional to the

 desired probability.  

If 1 2 , ,..., X X X N are the positive integers proportional to the probabilities assigned to the N  

units in the  population, then a possible way to associate the cumulative totals of the units. Then the units are selected  based on the values of cumulative totals. This is illustrated in the following table: 


Units 

Size 

Cumulative Total 

Select a random  number

between 1 and  TN by using the  random number  table. 

1 If , T RT i i ≤ ≤ then  ith unit is selected  

with probability , i

i = 1,2,…, N .  

Repeat the procedure  n times to get a  

sample of size n.

1  

2  

 

i

 

N

X1 

X2 

 

Xi1 

Xi 

 

=

X X 

N j 

T X 1 1

TXX 2 12 = + 

 

− 

= ∑ 

T X 

i j 

− 

= ∑ 

T X 

i j 

 

=

T X 

N j 



In this case, the probability of selection of ith unit is 

TT X PT T 

= = 

ii i 

1 

  

N N 

⇒ ∝ 

P X 

i i 

Note that TN is the population total which remains constant.  

Drawback: This procedure involves writing down the successive cumulative totals. 

This is time consuming and tedious if the number of units in the population is large.  

This problem is overcome in Lahiri’s method.  

Lahiri’s method:  

Let  

= i.e., maximum of the sizes of N units in the population or some convenient number  

M Max X 

,i 

1,2,..., 

i N 

greater than M .  

The sampling procedure has the following steps:  

1. Select a pair of the random number (i, j) such that 1 ,1 . iN jM ≤ ≤≤ 

2. If ,i j X then ith unit is selected otherwise rejected and another pair 

of random number is  chosen.  

3. To get a sample of size n , this procedure is repeated till n units are selected.  

Now we see how this method ensures that the probabilities of selection of units are

 varying and are  proportional to size. 

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur Page 3 

Probability of selection of ith unit at a trial depends on two possible outcomes  

– either it is selected at the first draw  

– or it is selected in the subsequent draws preceded by ineffective draws. Such probability is given by  

 ≤≤ ≤ ≤ 

P i NP j M i 

(1 ) (1 | ) 

X

1. , say. i

= = N M 

1 Probability that no unit is selected at a trial 1

⎛ ⎞ = − ⎜ ⎟ ⎝ ⎠ 

∑ 

i 

N M 


⎛ ⎞ = − ⎜ ⎟ ⎝ ⎠ 

NX

N M 

X Q

=− = 

1 , say. 

The probability that unit i is selected

P QP Q P 

=+ + + ii i 

... 

= − 

i 


X NM X X X / .

= == ∝ i ii 

X M NX X 

total 

Thus the probability of selection of unit i is proportional to the size Xi . So this method generates a pps  sample.  

Advantage:  

1. It does not require writing down all cumulative totals for each unit.  

2. Sizes of all the units need not be known beforehand. We need only some number greater than the  maximum size and the sizes of those units which are selected by the choice of the first set of  random numbers 1 to N for drawing sample under this scheme.  

Disadvantage: It results in the wastage of time and efforts if units get rejected.  A draw is ineffective if one of the ineffective random numbers is selected.

 

 

R CODE 

A sample survey was conducted to study the yield of wheat in haryana.A sample of 20 farms from a total of 100 was taken, with probability proportional to area under wheat crop with replacement method.The total area under the wheat crop was 484.5 hectares.The area under the crop (X) and yield (Y) were noted in hectares and quantiles per hectare respectively.

i)Estimate the average yield per farm along with its standard error for the given sample.





                                Hence the average yield per farm is 29.21369.

 
 

          CONCLUSION: Hence for the given data the average yield per farm is 29.21369 with the standard error as 1.392459

Comments

Popular posts from this blog

Comparing the efficiency of SRSWOR and SRSWR with the help of R Programming

Selection of samples:SRSWR vs SRSWOR(2048114)

pps (probability proportional to size) Systematic Sampling