MULTI-STAGE SAMPLING
MULTI-STAGE SAMPLING
-CERIN THERESA DOMINIC 2048120
INTRODUCTION:-
In the field of data analysing, we all extract samples from a
given population for summing up the
characteristic features of a population. But is it possible to evaluate all the selected samples? No ,it is not. It’s
more expensive , time-consuming and much effort needed process. So under these circumstances ,it is always better to
choose Multi-stage sampling. Here we divide the population into groups or clusters.
Out of which we choose some clusters at
random by using any of the preferred sampling technique and from all the
selected clusters, we draw samples. The randomly selected clusters from
population may be regarded as the first stage sampling units and the units selected
from each of these first stage clusters are called second stage sampling units.
Thus multistage sampling can also be called as a complex form of cluster sampling. Multi stage sampling can be used when we don’t
have the complete list of population. Here in each stage of selecting samples
,the sampling units become smaller and smaller. One good advantage of multi
stage sampling is that it increases the efficiency of the estimator by
distributing elements over a large
number of clusters . This sampling technique is more flexible as we can
use different selection procedures in different stages.
Some important notations of multi stage sampling :-
- N : number of primary units in the
population
- Mi : number of secondary units in the ith primary unit
- yi = MiΣj=1( yij)·
- n : umber of primary units selected in
the first stage
- mi :
number of secondary units selected in the second stage
- population total = τ =NΣi=1 MiΣj=1 (yij)
- μ=τM where M=N∑i (Mi)
- yij= the response of the j th secondary unit within the i th primary unit
· Some real life applications:-
·
The Census Bureau uses multistage sampling for the U.S.
National Center for
Health Statistics’ National Health Interview Survey (NHIS). A multistage
probability sample of 42,000 households in 376 probability sampling units ,
which are chosen in groups of around four adjacent households.
·
The Gallup poll uses multistage sampling. For example,
they might randomly choose a certain number of area codes then randomly sample
a number of phone numbers from within each area code.
·
Johnston et. al’s survey on drug use in high
schools used three stage sampling: geographic areas, followed by high
schools within those areas, followed by senior students in those schools.
·
The Australian
Bureau of Statistics divides cities into “collection
districts”, then blocks, then households. Each stage uses random sampling,
creating a need to list specific households only after the final stage of
sampling.
Tips for efficient multistage sampling:
Here are some tips to keep in mind when conducting multistage sampling research.
- Think cautiously – It’s good practice to brainstorm about a way to implement the multistage approach.
- Keep in mind that as there’s no exact definition of multiphase sampling, there’s no conventional method on a route to mix the sampling methods (such as cluster, stratified, and simple random).
- The process design must be in a way that is both cost-effective and time-effective.
- Retaining its randomness and its sample size is a must.
- Consult an experienced and skilled expert when you apply this method for the first time.
To analyse multi stage sampling using r code:-
In order to
estimate the condition of highways under its jurisdiction and the cost of
urgent repairs, the state Department of Transportation selected a number of
“highway miles” in two stages. In the first stage, a number of highways were
selected at random and without replacement from the list of all highways
maintained by the Department. In the second stage, a number of one-mile
segments were selected at random and without replacement from the total length
of each selected highway; for example, if the length of highway 101 is 73
miles, it is seen as consisting of 73 one-mile segments (“highway miles”), from
which a number are selected at random. Highway engineers then visit the
selected segments, inspect the pavement condition, rate the condition of the
segment, and estimate the cost of urgently needed repairs. For the purpose of
this problem, assume there are 352 highways in the state, with a total length
of 28,950 miles. A simple random sample of five highways was selected without
replacement. From each selected highway, approximately 10% of its one-mile
segments were then selected. The inspection results were as follows:
Highway Number |
Length (miles) |
Selected One-Mile Segments |
Number Rated Excellent |
Cost of Urgent Repairs (in
$1,000) |
155 |
85 |
10 |
2 |
90 |
489 |
120 |
15 |
1 |
110 |
283 |
47 |
5 |
0 |
60 |
698 |
98 |
10 |
0 |
100 |
311 |
34 |
5 |
1 |
30 |
R code to compute the estimated total for the highway example
follows:
>N <- 352; n <- 5; M <- 28950
> Mi <- c(85,120,47,98,34) # total no. segments on the highways sampled
> mi <- c(10,15,5,10,5) # no. of segments sampled
> yi <- c(2,1,0,0,1) # no. of excellent segments
# Unbiased estimation of total number of segments rated Excellent
> yhati <- (Mi/mi)*yi 92
> yhati # estimated no. excellent segments on each highway
[1] 17.0 8.0 0.0 0.0 6.8
> tauhat <- (N/n)*sum(yhati) # estimated total no. excellent
> tauhat
[1] 2238.72
So an unbiased estimate of the total number of highway
segments rated as Excellent is τb = 2238.7
So, to finish answering part (a) of the highway problem
(where we already estimated the total number of segments in Excellent condition
to be τb = 2238.7), the estimated proportion of highway miles in Excellent
condition, as well as standard errors for both this proportion and the total
are given via the R code below:
> su2 <- var(yhati)
> su2 [1] 49.248
> pi <- yi/mi # Proportion of segments rated excellent on each highway
> pi
[1] 0.20000000 0.06666667 0.00000000 0.00000000 0.20000000
> si2 <- (mi/(mi-1))*pi*(1-pi) # Estimated variance within each primary unit
> si2
[1] 0.17777778 0.06666667 0.00000000 0.00000000 0.20000000
> var1 <- (N*(N-n)*su2)/n # Term 1 of variance
> var2 <- (N/n)*sum((Mi*(Mi-mi)*si2)/mi) # Term 2 of variance
> c(var1,var2) [1] 1203069.54 14697.64
> var.tauhat <- var1 + var2
> SE.tauhat <- sqrt(var.tauhat) # SE of estimate of total
> SE.tauhat
[1] 1103.525
> c(tauhat-qt(.975,n-1)*SE.tauhat,tauhat+qt(.975,n-1)*SE.tauhat) # 95% CI
[1] -825.1563 5302.5963
> phat <- tauhat/M
> phat # estimate of proportion Excellent
[1] 0.07733057
> SE.phat <- SE.tauhat/M # SE of estimate of proportion
> SE.phat
[1] 0.0381183
> c(phat-qt(.975,n-1)*SE.phat,phat+qt(.975,n-1)*SE.phat) # 95% CI
[1] -0.02850281 0.18316395
Note that the confidence interval extends below 0. Since the
estimated proportions within each highway are near 0, our sample sizes are too
small to assume a normal sampling distribution for pb.
Multistage sampling refers to sampling plans where the
sampling is carried out in stages using smaller and smaller sampling units at
each stage. In a two-stage sampling design, a sample of primary units is
selected and then a sample of secondary units is selected within each primary
unit. This handout outlines the development of estimators under the general
setting of two-stage sampling, considers the allocation question under the
setting of equal sized primary and secondary units, and briefly examines
three-stage sampling.
Comments
Post a Comment