Linear Systematic Sampling Technique
Linear Systematic Sampling
Done by:
Hari Prasad
2048102
Introduction
The world we live today produces tonnes of data points, and these have become a significant source of information which if we can produce useful results from it could benefit a majority of sectors and people around
the world. The problems faced by the professionals in the field of data modelling
and analytics is to extract the useful contents from these vast databases and fit appropriate models to it. Over the years, statisticians and other professionals have
formulated different sampling techniques that would help the user to extract a part of the population that
would best represent the data. In this blog, I would cover the linear systematic sampling technique.
Systematic Sampling
Systematic
sampling is a type of probability sampling where the individual chooses a
random start from the target population and continues to select sampling units at a fixed sampling interval. This technique is similar to simple random sampling but easier to
conduct, and when the researcher has budget constraints. It is necessary to have the entire sampling frame to perform
systematic sampling on a population. It is also crucial that the data should not follow any pattern or classification.
Basic terminologies and notations
·
N --> Total number of observations in the
data or population size
·
n --> Total number of observations in the sample or sample size
·
k --> sampling interval
The
observations in the systematic sampling are arranged as in the following table:
Steps involved in systematic sampling
1. Select a random number between 1 and k
2.
Suppose it as 'r'.
3.
Select the first sample unit with the serial number r
4. Select the next kth unit after the rth unit.
5. Repeat it for the next (n-2) times to choose
the sample.
Types of Systematic Sampling
·
Linear Systematic Sampling
·
Circular Systematic Sampling
Where to use Systematic Sampling?
The
main situations suitable to use systematic sampling are,
1.
When there is a budget restriction
2.
When the population units are large or when
taking responses from individuals
3.
When the units do not follow a particular pattern
4. When there should be least or no data
manipulation or bias
Linear Systematic Sampling
Systematic
sampling wherein there are k possible set of samples each having an equal probability of 1/k of being selected. Therefore, the first unit in each possible sample data is selected at random while the other (n-1) units are selected systematically by the sampling interval k.
This blog focuses on Linear Systematic sampling and some real-life applications of systematic sampling.
Application of Linear Systematic Sampling
To
understand how systematic sampling works and how to estimate the population
parameters, I have used a data frame from kaggle.com. The data is collected by a health insurance company in the order of their id number. The data contains information
about 1000 clients namely their,
·
Age: age of the client
·
Sex: gender of the client
·
BMI: body mass index
·
Children: number of children
·
Smoker: whether the client is a smoker or not
(logical entries)
·
Region: part of the country they are from
·
Charges: maximum amount of insurable amount
Since
the data does not follow any pattern of classification, it is feasible for the
individual to perform linear systematic sampling to the data to obtain the required samples. To select the systematic sample, we will use the R-programming language.
A researcher would like to select 100 sampling units from the population.
Thus,
we have selected a 1 on 10 systematic samples for the population.
Advantages of Systematic Sampling
·
Convenient and straightforward.
·
A better way of representing a population in a faster manner
·
Free from favouritism and personal bias
·
Minimum risk involves
·
Cost-efficient
Limitations of Systematic Sampling
·
There are certain cases where sample units
have unequal chances of being selected. There is a chance for a particular combination of systematic sample
to not being selected.
·
After ordering the population units, if the
units follow some pattern, then the systematic sampling may not provide us with the best representatives of the data.
Comments
Post a Comment