PROBABILITY PROPORTIONAL TO SIZE (PPS)

PROBABILITY PROPORTIONAL TO SIZE

Done by: Sindhu .D

2048135

The simple random sampling scheme provides a random sample where every unit in the population has an equal probability of selection. Under certain circumstances, more efficient estimators are obtained by assigning unequal probabilities of selection to the units in the population. This type of sampling is known as varying probability sampling scheme.

If Y is the variable under study and X is an auxiliary variable related to Y, then in the most commonly

used varying probability scheme, the units are selected with probability proportional to the value of X,

called as size. This is termed as probability proportional to a given measure of size (pps) sampling.

If the sampling units vary considerably in size, then SRS does not takes into account the possible

importance of the larger units in the population. A large unit, i.e., a unit with a large value of Y

contributes more to the population total than the units with smaller values, so it is natural to expect that a

selection scheme which assigns more probability of inclusion in a sample to the larger units than to the smaller

units would provide more efficient estimators than the estimators which provide equal probability to all the units.

This is accomplished through pps sampling.

Note that the “size” considered is the value of auxiliary variable X and not the value of study variable Y.

For example, in an agriculture survey, the yield depends on the area under cultivation.

So bigger areas are likely to have a larger population, and they will contribute more towards

the population total, so the value of the area can be considered as the size of the auxiliary variable.

Also, the cultivated area for a previous period can also be taken as the size while estimating

the yield of the crop. Similarly, in an industrial survey, the number of workers in a factory can be

considered as the measure of size when studying the industrial output from the respective factory.

➡️ There are 2 different ways of selecting the sample .i.e ppswr and ppswor

PPS sampling with replacement (WR):

First, we discuss the two methods to draw a sample with PPS and WR.

1. Cumulative total method:

The procedure of selecting a simple random sample of size n consists of

- associating the natural numbers from 1 to N units in the population and

- then selecting those n units whose serial numbers correspond to a set of n numbers where each

number is less than or equal to N which is drawn from a random number table.

In the selection of a sample with varying probabilities, the procedure is to associate with

each unit a set of consecutive natural numbers, the size of the set being proportional to the

desired probability.

If 1 2 , ,..., X X X N are the positive integers proportional to the probabilities assigned to the N

units in the population, then a possible way to associate the cumulative totals of the units. Then the units are selected based on the values of cumulative totals. This is illustrated in the following table:

Units

Size

Cumulative Total

Select a random number R

between 1 and TN by using the random number table.

∙ 1 If , T RT i i − ≤ ≤ then ith unit is selected

with probability , i T

i = 1,2,…, N .

∙ Repeat the procedure n times to get a

sample of size n.



i −1



Xi−1



= ∑

X X

N j

T X 1 1 =

TXX 2 12 = +



−

= ∑

T X

i j

−

= ∑

T X

i j



= ∑

T X

N j

In this case, the probability of selection of ith unit is

TT X PT T

− − = =

ii i

N N

⇒ ∝

P X

i i

Note that TN is the population total which remains constant.

Drawback: This procedure involves writing down the successive cumulative totals.

This is time consuming and tedious if the number of units in the population is large.

This problem is overcome in Lahiri’s method.

Lahiri’s method:

Let

= i.e., maximum of the sizes of N units in the population or some convenient number

M Max X

1,2,...,

i N

greater than M .

The sampling procedure has the following steps:

1. Select a pair of the random number (i, j) such that 1 ,1 . ≤ iN jM ≤ ≤≤

2. If ,i j ≤ X then ith unit is selected otherwise rejected and another pair

of random number is chosen.

3. To get a sample of size n , this procedure is repeated till n units are selected.

Now we see how this method ensures that the probabilities of selection of units are

varying and are proportional to size.

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur Page 3

Probability of selection of ith unit at a trial depends on two possible outcomes

– either it is selected at the first draw

– or it is selected in the subsequent draws preceded by ineffective draws. Such probability is given by

≤≤ ≤ ≤

P i NP j M i

(1 ) (1 | )

X P

1. , say. ii

= = N M

1 Probability that no unit is selected at a trial 1 N

⎛ ⎞ = − ⎜ ⎟ ⎝ ⎠

∑

N M

⎛ ⎞ = − ⎜ ⎟ ⎝ ⎠

NX N

N M

X Q M

=− =

1 , say.

The probability that unit i is selected

P QP Q P

=+ + + ii i

...

= −

X NM X X X / . /

= == ∝ i ii

X M NX X

total

Thus the probability of selection of unit i is proportional to the size Xi . So this method generates a pps sample.

Advantage:

1. It does not require writing down all cumulative totals for each unit.

2. Sizes of all the units need not be known beforehand. We need only some number greater than the maximum size and the sizes of those units which are selected by the choice of the first set of random numbers 1 to N for drawing sample under this scheme.

Disadvantage: It results in the wastage of time and efforts if units get rejected. A draw is ineffective if one of the ineffective random numbers is selected.

❋R CODE