Random Group Method

RANDOM GROUP METHOD

-Ruchika Lalla

OVERVIEW OF RANDOM GROUP METHOD

Sample Survey Design is an integral aspect of Statistical Analyses. While designing a sample survey, it is also equally important to choose a suitable method of variance estimation.

The Random Group Method is a form of replicate variance estimation. This method consists of drawing a number of samples (sub-samples) from the population, estimating the parameter of interest for each sub-sample and assessing its variance based on the deviations of these statistics from the corresponding statistic derived from the union of all the sub-samples. [1]

Historically, this was one of the first techniques developed to simplify variance estimation for complex sample surveys. It was introduced in jute acreage surveys in Bengal by Mahalanobis, who called the various samples interpenetrating samples. [2]

ADVANTAGES AND DISADVANTAGES OF RANDOM GROUP METHOD

The advantages of this method are given as –

  • requiring no separate theoretical derivations of a variance formula for each problem, which can be difficult or messy;
  • programming ease in complex situations;
  • using a unified recipe for various problems;
  • some degree of robustness against violations of models/assumptions.

Its main drawback is that it is computationally intensive. Especially in large-scale surveys, its cost in time may be prohibitively large. Moreover, its theoretical validity holds only for linear statistics and asymptotic. [1]

PROCEDURE OF RANDOM GROUP METHOD

Forming random groups has to ensure that the original sampling design is reflected within the groups [2]. If the original sampling design is one-stage stratified sampling with H strata, for example, then each group should contain all H strata. If cluster sampling (either one-stage or multi-stage) is used, then clusters should be considered as units in forming random groups and the random group variance estimator should be used when the number of clusters is large. [3]

The procedure of drawing the sample is given as-

  • A sample, s1, is drawn from the finite population according to a well-defined sampling design (S, P ), with no restrictions are placed on the design.
  •  Following the selection of the first sample, s1 is replaced into the population, and a second sample, s2, is drawn according to the same sampling design (S, P ).
  •  This process is repeated until k ≥ 2 samples, s1,..., sk, are obtained, each being selected according to the common sampling design and each being replaced before the selection of the next sample. We shall call these k samples random groups. [2]

APPLICATIONS OF RANDOM GROUP METHOD

This method is a popular replication method used in many economic surveys and in agencies such as the U.S. Census Bureau and the U.S. Bureau of Labor Statistics. [3] Additionally, the random group (RG) method of variance estimation was used in the Medical Expenditure Panel Survey – Insurance Component (MEPS-IC) from the beginning of the survey from 1996 through 2013. During the sequential sample selection process in the MEPS-IC, each selected establishment is assigned a number corresponding to its place in the order of selection. These selection numbers are converted to α=10 groups numbered 0 to 9 by assigning an establishment to the group determined by the last digit in its selection number. Thus, if the selection number were 73, the establishment would be assigned to group 3. Each group can then be thought of as a subsample similar to the full sample with each unit having a chance of selection into the subsample that was one-tenth its chance of selection into the full sample. [4]

IMPORTANT FORMULAE USED IN THE RANDOM GROUP METHOD

The mean and variance in this method can be computed as-


Inferences about the parameter are mainly drawn from z-theory and Student’s t-theory.

If the variance of θ(hat) is essentially known without error or if k is very large, then a (1 − α) 100% confidence interval for θ is-


APPLICATION IN R

#suppose we have 10 drawers with 10 cards per drawer. Cards are of 8 different categories, and we have to draw random groups.
#Step 1- Import data

data<-read.csv("carddata.csv",header = TRUE)
head(data)

##   Drawer Card
## 1      1    6
## 2      1    7
## 3      1    3
## 4      1    3
## 5      1    2
## 6      1    0

#Step 2- Stratify data on the basis of drawer number
library(splitstackshape)
set.seed(100)
stratified(data,"Drawer",0.1)

##     Drawer Card
##  1:      1    7
##  2:      2    6
##  3:      3    1
##  4:      4    1
##  5:      5    2
##  6:      6    1
##  7:      7    5
##  8:      8    0
##  9:      9    5
## 10:     10    7

#the sample drawn here is one random group, this process can be repeated multiple times; the estimates can be computed for each group separately

#Random group 2
set.seed(101)
stratified(data,"Drawer",0.1)

##     Drawer Card
##  1:      1    0
##  2:      2    4
##  3:      3    6
##  4:      4    0
##  5:      5    6
##  6:      6    7
##  7:      7    7
##  8:      8    6
##  9:      9    2
## 10:     10    6

Thus, this example uses a combination of stratified sampling and the random group method. After each group is generated, the required estimates can be computed, and the group drawn returned to the sample.

References

[1]

"Monographs official statistics- Variance estimation methods in the European Union," Office for Official Publications of the European Communities, Luxembourg, 2002.

[2]

K. M. Wolter, "The Method of Random Groups," in Introduction to Variance Estimation, Statistics for Social and Behavioral Sciences, New York, Springer, NY, 2007, pp. 21-106.

[3]

J. Shao and Q. Tang, "Random Group Variance Estimators for Survey Data with Random Hot Deck Imputation," Journal of Official Statistics, vol. 27, no. 3, pp. 507-526, 2011.

[4]

S. R. Chowdhury and D. Kashihara, "A Comparison of Variance Estimates Using Random Group and Taylor Series Methods for a Large National Survey of Businesses," Agency for Healthcare Research and Quality, Rockville, 2017.

 

Comments

Popular posts from this blog

Comparing the efficiency of SRSWOR and SRSWR with the help of R Programming

Selection of samples:SRSWR vs SRSWOR(2048114)

pps (probability proportional to size) Systematic Sampling