Probability Proportion to size ordered Sampling – Desraj Estimator

Karan Singhal 

Introduction: -

Sampling is a process used in statistical analysis in which a predetermined number of observations are taken from a larger population. The methodology used to sample from a larger population depends on the type of analysis being performed. One such sampling method is Probability Proportion to size without replacement ordered Estimate. As we know, sampling without replacement is more efficient than with replacement, and this rule also applies to PPS sampling. Since the probability of inclusion changes by draws or selected units' order, then ppswor is divided into ordered estimator and unordered estimator for better clarity.

Furthermore, to overcome the difficulty of changing expectations with each draw, associate a new variate with each draw such that its expectation is equal to the population value of the variate under study. Such estimators take into account the order of the draw. They are called ordered estimates. The order of the value obtained at the previous draw will affect the unbiasedness of the population mean. Des Raj (1956) have given an ordered estimator, that is, estimators that take into account the order in which the units are drawn. He proposed such an estimator which make use of conditional probabilities.

Formula:-

An estimator in case of 2 draws. y1 and y2 are the values of units drawn at the first and second draw. pi is the probability of selection of unit.

 

 

Variance for the sample size 2.

 

 



Applications: -

The application for ordered estimator can be applied to any sampling procedure as it is a more improved version of SRSWOR. AS The simple random sampling scheme provides a random sample where every unit in the population has an equal probability of selection. Under certain circumstances, more efficient estimators are obtained by assigning unequal probabilities of selection to the units in the population.

The application is demonstrated on the data-set which shows the area Under lime (in acres) and No. of bearing lime trees in each of the 22 villages growing lime in one of the tehsils of Bangalore district. The procedure includes drawing the sample of size 5 and with ppswor scheme and estimate the total number of bearing lime trees using ordered Desraj estimator, then also give the bound on the error of estimation.

## Analysis: -
library(samplingbook)    #importing the package

## Loading required package: pps

## Loading required package: sampling

## Loading required package: survey

## Loading required package: grid

## Loading required package: Matrix

## Loading required package: survival

##
## Attaching package: 'survival'

## The following objects are masked from 'package:sampling':
##
##     cluster, strata

##
## Attaching package: 'survey'

## The following object is masked from 'package:graphics':
##
##     dotchart

library(readxl)
library(fpest)

## Warning: package 'fpest' was built under R version 4.0.3

data<-read_excel('C:/Users/karan/Desktop/ppswor.xlsx')
data                    # dataset providing information on area under limes and no. of bearing lime trees

## # A tibble: 22 x 3
##    `S.No. of villages` `Area Under lime(in acres)` `No. of bearing lime trees`
##                  <dbl>                       <dbl>                       <dbl>
##  1                   1                       32.8                         2328
##  2                   2                        7.97                         754
##  3                   3                        0.62                         105
##  4                   4                       15.6                          949
##  5                   5                       42.8                         3091
##  6                   6                       40.0                         1736
##  7                   7                        9.39                         840
##  8                   8                        6.33                         311
##  9                   9                        5.05                           0
## 10                  10                       94.6                         3044
## # ... with 12 more rows

X=sum(data$`Area Under lime(in acres)`)
X                   #total of area under lime

## [1] 497.66

pi=data$`Area Under lime(in acres)`/X
pi     #probability of selection at the 1st draw at the i th unit (i=1 to N)

##  [1] 0.0658481694 0.0160149500 0.0012458305 0.0313667966 0.0861029619
##  [6] 0.0804364426 0.0188683037 0.0127195274 0.0101474903 0.1899891492
## [11] 0.1079250894 0.0013463007 0.0016477113 0.0043202186 0.0008640437
## [16] 0.2478800788 0.0005827272 0.0060282120 0.0080376160 0.0040188080
## [21] 0.0124783989 0.0921311739

d1<-cbind(data,pi)
d1   # combining the pi column to the 'data'.

##    S.No. of villages Area Under lime(in acres) No. of bearing lime trees
## 1                  1                     32.77                      2328
## 2                  2                      7.97                       754
## 3                  3                      0.62                       105
## 4                  4                     15.61                       949
## 5                  5                     42.85                      3091
## 6                  6                     40.03                      1736
## 7                  7                      9.39                       840
## 8                  8                      6.33                       311
## 9                  9                      5.05                         0
## 10                10                     94.55                      3044
## 11                11                     53.71                      2483
## 12                12                      0.67                       128
## 13                13                      0.82                       102
## 14                14                      2.15                        60
## 15                15                      0.43                         0
## 16                16                    123.36                     11799
## 17                17                      0.29                        26
## 18                18                      3.00                       317
## 19                19                      4.00                       190
## 20                20                      2.00                       180
## 21                21                      6.21                       752
## 22                22                     45.85                      3091
##              pi
## 1  0.0658481694
## 2  0.0160149500
## 3  0.0012458305
## 4  0.0313667966
## 5  0.0861029619
## 6  0.0804364426
## 7  0.0188683037
## 8  0.0127195274
## 9  0.0101474903
## 10 0.1899891492
## 11 0.1079250894
## 12 0.0013463007
## 13 0.0016477113
## 14 0.0043202186
## 15 0.0008640437
## 16 0.2478800788
## 17 0.0005827272
## 18 0.0060282120
## 19 0.0080376160
## 20 0.0040188080
## 21 0.0124783989
## 22 0.0921311739

set.seed(123)
pps_wor=d1[sample(1:nrow(d1), 5, replace=FALSE), ]
pps_wor    # drawing the sample of size 5 using ppswor

##    S.No. of villages Area Under lime(in acres) No. of bearing lime trees
## 15                15                      0.43                         0
## 19                19                      4.00                       190
## 14                14                      2.15                        60
## 3                  3                      0.62                       105
## 10                10                     94.55                      3044
##              pi
## 15 0.0008640437
## 19 0.0080376160
## 14 0.0043202186
## 3  0.0012458305
## 10 0.1899891492

desraj(pps_wor$`No. of bearing lime trees`, pps_wor$pi)   #desraj ordered estimator

## $est
## [1] 27426.98
##
## $estvar
## [1] 210519342
##
## $tvals
## [1]     0.00 23618.43 13954.56 83416.77 16145.17

var=210519342
se<-sqrt(var)
se #standard error

## [1] 14509.28

bound_error<-2*sqrt(var)
bound_error     #bound-on error

## [1] 29018.57

## Conclusion/Interpretation: - The analysis shows that the estimation of the total no. of bearing lime trees in the population is 27427 trees with the variance on the estimated value as 210519342. The bound-on error estimate is 29018.57. So, we can say that almost about 27427 total no. of trees are in 22 villages growing lime in one of the Bangalore district's tehsils.

 

Limitations:-

It appears in PPS sampling that such procedure would give biased estimators as the larger units are over-represented and the smaller units are under-represented in the sample. This will happen in the case of the sample mean as an estimator of the population mean where all the units are given equal weight. Instead of giving equal weights to all the units, if the sample observations are suitably weighted at the estimation stage by taking the probabilities of selection into account, then it is possible to obtain unbiased estimators.

Comments

Popular posts from this blog

Selection of samples:SRSWR vs SRSWOR(2048114)

Overview of Systematic Sampling