MULTI-STAGE SAMPLING

                                                          MULTI-STAGE SAMPLING                                  

                                                       -CERIN THERESA DOMINIC 2048120

INTRODUCTION:-

In the field of data analysing, we all extract samples from a given  population for summing up the characteristic features of a population. But is it possible to evaluate  all the selected samples? No ,it is not. It’s more expensive , time-consuming and much effort needed process. So under  these circumstances ,it is always better to choose Multi-stage sampling. Here we divide the population into groups or clusters. Out of which we choose some clusters  at random by using any of the preferred sampling technique and from all the selected clusters, we draw samples. The randomly selected clusters from population may be regarded as the first stage sampling units and the units selected from each of these first stage clusters are called second stage sampling units. Thus multistage sampling can also be called as  a complex form of cluster sampling.  Multi stage sampling can be used when we don’t have the complete list of population. Here in each stage of selecting samples ,the sampling units become smaller and smaller. One good advantage of multi stage sampling is that it increases the efficiency of the estimator by distributing elements over a large  number of clusters . This sampling technique is more flexible as we can use different selection procedures in different stages.


Some important notations of multi stage sampling :-

  • N  : number of primary units in the population
  • Mi :  number of secondary units in the ith primary unit
  •  yi    MiΣj=1( yij)·              
  •  n : umber of primary units selected in the first stage
  • mi : number of secondary units selected in the second stage
  •   population total = τ =NΣi=1   MiΣj=1  (yij)
  •  μ=τM  where M=N(Mi)
  • yij= the response of the j th secondary unit within the i th primary unit

·        Some real life applications:-

·         The Census Bureau uses multistage sampling for the U.S. National Center for
Health Statistics’ National Health Interview Survey (NHIS). A multistage probability sample of 42,000 households in 376 probability sampling units , which are chosen in groups of around four adjacent households.

·         The Gallup poll uses multistage sampling. For example, they might randomly choose a certain number of area codes then randomly sample a number of phone numbers from within each area code.

·         Johnston et. al’s survey on drug use in high schools used three stage sampling: geographic areas, followed by high schools within those areas, followed by senior students in those schools.

 

·         The Australian Bureau of Statistics divides cities into “collection districts”, then blocks, then households. Each stage uses random sampling, creating a need to list specific households only after the final stage of sampling.

Tips for efficient multistage sampling:

Here are some tips to keep in mind when conducting multistage sampling research.

  1. Think cautiously – It’s good practice to brainstorm about a way to implement the multistage approach.
  2. Keep in mind that as there’s no exact definition of multiphase sampling, there’s no conventional method on a route to mix the sampling methods (such as cluster, stratified, and simple random).
  3. The process design must be in a way that is both cost-effective and time-effective. 
  4. Retaining its randomness and its sample size is a must.
  5. Consult an experienced and skilled expert when you apply this method for the first time.

To analyse multi stage sampling using r code:-

In order to estimate the condition of highways under its jurisdiction and the cost of urgent repairs, the state Department of Transportation selected a number of “highway miles” in two stages. In the first stage, a number of highways were selected at random and without replacement from the list of all highways maintained by the Department. In the second stage, a number of one-mile segments were selected at random and without replacement from the total length of each selected highway; for example, if the length of highway 101 is 73 miles, it is seen as consisting of 73 one-mile segments (“highway miles”), from which a number are selected at random. Highway engineers then visit the selected segments, inspect the pavement condition, rate the condition of the segment, and estimate the cost of urgently needed repairs. For the purpose of this problem, assume there are 352 highways in the state, with a total length of 28,950 miles. A simple random sample of five highways was selected without replacement. From each selected highway, approximately 10% of its one-mile segments were then selected. The inspection results were as follows:

Highway Number

Length (miles)

Selected One-Mile Segments

Number Rated Excellent

Cost of Urgent Repairs (in $1,000)

155

85

10

2

90

489

120

15

1

110

283

47

5

0

60

698

98

10

0

100

311

34

5

1

30










(a) Estimate the proportion and number of state highway miles that are in Excellent condition.

R code to compute the estimated total for the highway example follows:

>N <- 352; n <- 5; M <- 28950

> Mi <- c(85,120,47,98,34) # total no. segments on the highways sampled

> mi <- c(10,15,5,10,5) # no. of segments sampled

> yi <- c(2,1,0,0,1) # no. of excellent segments

  # Unbiased estimation of total number of segments rated Excellent  

> yhati <- (Mi/mi)*yi 92

> yhati # estimated no. excellent segments on each highway

 [1] 17.0 8.0 0.0 0.0 6.8

 > tauhat <- (N/n)*sum(yhati) # estimated total no. excellent

> tauhat

[1] 2238.72

So an unbiased estimate of the total number of highway segments rated as Excellent is τb = 2238.7

So, to finish answering part (a) of the highway problem (where we already estimated the total number of segments in Excellent condition to be τb = 2238.7), the estimated proportion of highway miles in Excellent condition, as well as standard errors for both this proportion and the total are given via the R code below:

 > su2 <- var(yhati)

> su2 [1] 49.248

 > pi <- yi/mi # Proportion of segments rated excellent on each highway

> pi

[1] 0.20000000 0.06666667 0.00000000 0.00000000 0.20000000

> si2 <- (mi/(mi-1))*pi*(1-pi) # Estimated variance within each primary unit

> si2

[1] 0.17777778 0.06666667 0.00000000 0.00000000 0.20000000

> var1 <- (N*(N-n)*su2)/n # Term 1 of variance

> var2 <- (N/n)*sum((Mi*(Mi-mi)*si2)/mi) # Term 2 of variance

> c(var1,var2) [1] 1203069.54 14697.64

 > var.tauhat <- var1 + var2

 > SE.tauhat <- sqrt(var.tauhat) # SE of estimate of total

> SE.tauhat

[1] 1103.525

> c(tauhat-qt(.975,n-1)*SE.tauhat,tauhat+qt(.975,n-1)*SE.tauhat) # 95% CI

 [1] -825.1563 5302.5963

 > phat <- tauhat/M

> phat # estimate of proportion Excellent

[1] 0.07733057

 > SE.phat <- SE.tauhat/M # SE of estimate of proportion

 > SE.phat

 [1] 0.0381183

> c(phat-qt(.975,n-1)*SE.phat,phat+qt(.975,n-1)*SE.phat) # 95% CI

 [1] -0.02850281 0.18316395

Note that the confidence interval extends below 0. Since the estimated proportions within each highway are near 0, our sample sizes are too small to assume a normal sampling distribution for pb.

 

Multistage sampling refers to sampling plans where the sampling is carried out in stages using smaller and smaller sampling units at each stage. In a two-stage sampling design, a sample of primary units is selected and then a sample of secondary units is selected within each primary unit. This handout outlines the development of estimators under the general setting of two-stage sampling, considers the allocation question under the setting of equal sized primary and secondary units, and briefly examines three-stage sampling.



Comments

Popular posts from this blog

Comparing the efficiency of SRSWOR and SRSWR with the help of R Programming

Selection of samples:SRSWR vs SRSWOR(2048114)

pps (probability proportional to size) Systematic Sampling