Probability Proportion to size ordered Sampling – Desraj Estimator
Karan Singhal
Introduction:
-
Sampling is a process used in
statistical analysis in which a predetermined number of observations are taken
from a larger population. The methodology used to sample from a larger population
depends on the type of analysis being performed. One such sampling method is
Probability Proportion to size without replacement ordered Estimate. As we know,
sampling without replacement is more efficient than with replacement, and this
rule also applies to PPS sampling. Since the probability of inclusion changes
by draws or selected units' order, then ppswor is divided into ordered
estimator and unordered estimator for better clarity.
Furthermore, to overcome the
difficulty of changing expectations with each draw, associate a new variate
with each draw such that its expectation is equal to the population value of
the variate under study. Such estimators take into account the order of the
draw. They are called ordered estimates. The order of the value obtained at the
previous draw will affect the unbiasedness of the population mean. Des
Raj (1956) have given an ordered
estimator, that is, estimators that take into account the order in
which the units are drawn. He proposed such an estimator which make use of
conditional probabilities.
Formula:-
An estimator in case of 2 draws. y1 and y2 are the values of units drawn at the first and second draw. pi is the probability of selection of unit.
Applications: -
The application for
ordered estimator can be applied to any sampling procedure as it is a more improved
version of SRSWOR. AS The simple random sampling scheme provides a random sample
where every unit in the population has an equal probability of selection. Under
certain circumstances, more efficient estimators are obtained by assigning
unequal probabilities of selection to the units in the population.
The application is demonstrated on the data-set which shows
the area Under lime (in acres) and No. of bearing lime trees in each of the 22
villages growing lime in one of the tehsils of Bangalore district. The procedure includes drawing the sample of size 5 and with ppswor scheme and
estimate the total number of bearing lime trees using ordered Desraj estimator,
then also give the bound on the error of estimation.
##
Analysis: -
library(samplingbook) #importing the package
## Loading
required package: pps
## Loading
required package: sampling
## Loading
required package: survey
## Loading
required package: grid
## Loading
required package: Matrix
## Loading
required package: survival
##
##
Attaching package: 'survival'
## The
following objects are masked from 'package:sampling':
##
## cluster, strata
##
##
Attaching package: 'survey'
## The
following object is masked from 'package:graphics':
##
## dotchart
library(readxl)
library(fpest)
## Warning:
package 'fpest' was built under R version 4.0.3
data<-read_excel('C:/Users/karan/Desktop/ppswor.xlsx')
data # dataset providing
information on area under limes and no. of bearing lime trees
## # A
tibble: 22 x 3
## `S.No. of villages` `Area Under lime(in
acres)` `No. of bearing lime trees`
## <dbl> <dbl> <dbl>
## 1 1 32.8 2328
## 2 2 7.97 754
## 3 3 0.62 105
## 4 4 15.6 949
## 5 5 42.8 3091
## 6 6 40.0 1736
## 7 7 9.39 840
## 8 8 6.33 311
## 9 9 5.05 0
## 10 10 94.6 3044
## # ...
with 12 more rows
X=sum(data$`Area Under lime(in acres)`)
X #total of area under
lime
## [1]
497.66
pi=data$`Area Under lime(in acres)`/X
pi #probability of
selection at the 1st draw at the i th unit (i=1 to N)
## [1] 0.0658481694 0.0160149500 0.0012458305
0.0313667966 0.0861029619
## [6] 0.0804364426 0.0188683037 0.0127195274
0.0101474903 0.1899891492
## [11]
0.1079250894 0.0013463007 0.0016477113 0.0043202186 0.0008640437
## [16]
0.2478800788 0.0005827272 0.0060282120 0.0080376160 0.0040188080
## [21]
0.0124783989 0.0921311739
d1<-cbind(data,pi)
d1 # combining the pi
column to the 'data'.
## S.No. of villages Area Under lime(in acres)
No. of bearing lime trees
## 1 1 32.77 2328
## 2 2 7.97 754
## 3 3 0.62 105
## 4 4 15.61 949
## 5 5 42.85 3091
## 6 6 40.03 1736
## 7 7 9.39 840
## 8 8 6.33 311
## 9 9 5.05 0
## 10 10 94.55 3044
## 11 11 53.71 2483
## 12 12 0.67 128
## 13 13 0.82 102
## 14 14 2.15 60
## 15 15 0.43 0
## 16 16 123.36 11799
## 17 17 0.29 26
## 18 18 3.00 317
## 19 19 4.00 190
## 20 20 2.00 180
## 21 21 6.21 752
## 22 22 45.85 3091
## pi
## 1 0.0658481694
## 2 0.0160149500
## 3 0.0012458305
## 4 0.0313667966
## 5 0.0861029619
## 6 0.0804364426
## 7 0.0188683037
## 8 0.0127195274
## 9 0.0101474903
## 10
0.1899891492
## 11
0.1079250894
## 12
0.0013463007
## 13
0.0016477113
## 14
0.0043202186
## 15
0.0008640437
## 16
0.2478800788
## 17
0.0005827272
## 18
0.0060282120
## 19
0.0080376160
## 20
0.0040188080
## 21
0.0124783989
## 22 0.0921311739
set.seed(123)
pps_wor=d1[sample(1:nrow(d1), 5, replace=FALSE),
]
pps_wor # drawing the sample of
size 5 using ppswor
## S.No. of villages Area Under lime(in acres)
No. of bearing lime trees
## 15 15 0.43 0
## 19 19 4.00 190
## 14 14 2.15 60
## 3 3 0.62 105
## 10 10 94.55 3044
## pi
## 15
0.0008640437
## 19
0.0080376160
## 14
0.0043202186
## 3 0.0012458305
## 10
0.1899891492
desraj(pps_wor$`No. of bearing lime trees`,
pps_wor$pi) #desraj ordered estimator
## $est
## [1]
27426.98
##
## $estvar
## [1]
210519342
##
## $tvals
## [1] 0.00 23618.43 13954.56 83416.77 16145.17
var=210519342
se<-sqrt(var)
se #standard error
## [1]
14509.28
bound_error<-2*sqrt(var)
bound_error #bound-on error
## [1]
29018.57
##
Conclusion/Interpretation: - The analysis shows that the
estimation of the total no. of bearing lime trees in the population is 27427
trees with the variance on the estimated value as 210519342. The bound-on error
estimate is 29018.57. So, we can say that almost about 27427 total no. of trees
are in 22 villages growing lime in one of the Bangalore district's tehsils.
Limitations:-
It appears in PPS
sampling that such procedure would give biased estimators as the larger units
are over-represented and the smaller units are under-represented in the sample.
This will happen in the case of the sample mean as an estimator of the population mean where all the units are given equal weight. Instead of giving
equal weights to all the units, if the sample observations are suitably
weighted at the estimation stage by taking the probabilities of selection into
account, then it is possible to obtain unbiased estimators.
Comments
Post a Comment