PPS SYSTEMATIC SAMPLING_ HARTLEY-RAO ESTIMATOR

 

          CENSUS 2001 & 2011(LITERACY RATES)

   SHIVANI AJITH         

INTRODUCTION: A sampling scheme with replacement in which each sampling unit has unequal probability of selection, the probability being proportional to the size of the auxiliary variable associated with the particular unit, is called probability proportion all to size and with replacement (PPSWR) sampling scheme.

PPS systematic sampling  has the great advantage that it is easy to implement. It also has the property that the inclusion probability of a unit is proportional to its size. Thus it is a type of so-called πps sampling, i.e., a unit’s inclusion probability πi is proportional to its size. Like simple systematic sampling, the PPS version has the disadvantage that there is no variance estimator for it. t in PPS sampling with replacement (ppswr), the probability of selecting a given unit on any given draw is proportional to its size, but the overall inclusion probability is not, i.e., it is not a πps sampling scheme (unless the sample size is one).

 The Hartley-Rao variance formula was designed to estimate the randomization variance of a Horvitz-Thompson estimator given a systematic probability proportional to size sample from a randomly ordered large population. Using an underappreciated formulation of this variance estimator, one can see that the Hartley-Rao variance estimator is unbiased under a model with a particular error structure given any sample

FORMULA FOR HARTLEY-RAO ESTIMATOR:

HR = 2-1 N -2 [(n-1)]-1∑∑(1-πij + ∑πk2 /n)(yi/πi – yjj)2

APPLICATION OF REAL LIFE EXAMPLE IN R:

CONTEXT

This is the official dataset released by the govt. of India based on the census 2001 and 2011 survey.

CONTENT

The data is of 35 Indian states and union territories.
The literacy rate is spread across the major parameters - Overall, Rural and Urban.
All the data is percentage of the total population of that state.

About the dataset

The data in this CSV file contains the data from the Govt. Of India website, regarding the literacy rate of the 35 states and union territories.

There are 3 key fields, literacy rate overall, Category, Name of the States/Union Territory

 ANALYSIS









INTERPRETATION-

1) The selected samples are

i)1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1  for  the literacy rate of states

ii)2,2,2,2,2,2,2 for the literacy rates of Union Territories

2) The estimate obtained for the average total literay rate for year 2011 using Hartley and Rao estimator is 3066.689 with standard error 32.4589 which implies that on an average, the mean value of total literacy rate in year 2011 is 3066.689 and it will lie in the range [3066.689 ± 32.4589].

3) The bias for estimate of total literacy rate for year 2011 is 342.0893

CONCLUSION:

The error structure, for the variance estimator remains nearly unbiased for a large sample and relatively larger population under mild conditions. The estimated standard error for variance using Hartley and Rao estimator for the total deaths in year 2020 is 32.489.The smaller the value of a standard error of estimate the closer are the dots to the regression Line and better is the estimate based on the equation of the line.

        Here the Bound of error B=2*SE = 64.917 gives that the Hartley-Rao estimator obtained will Not exceed the margin of error [3066.689 ± 32.4589].

Also the dataset for both total literacy rates in year 2001 and year 2011 are positively correlated.

 

The variance of the estimator given by Rao, Hartley gives a value smaller than that given by Horvitz and Thompson ,it shows that the estimator given by Rao, Hartley  is better than that given by Horvitz and Thompson. Even with a more general error structure, this variance estimator remains nearly model unbiased for a large sample and relatively larger population under mild conditions.

Comments

Popular posts from this blog

Selection of samples:SRSWR vs SRSWOR(2048114)

Comparing the efficiency of SRSWOR and SRSWR with the help of R Programming

Adaptive Sampling