SAMPLING FOR PROPOTIONS

             SAMPLING FOR PROPOTIONS

M.VINAIKA - 2048127

INTRODUCTION:

How to do the sampling for qualitative data?

Sometimes we wish to estimate the total number, the proportion, or the percentage of units in the population that possess some characteristics or attribute or fall into some defined class.
In many situations, the characteristics under study on which the observations are collected is qualitative in nature.
Foe example, the responses of customers in many marketing surveys are based on replies like 'yes' or 'no', 'agree' or 'disagree' etc. Sometimes the objective of the survey is to estimate the proportion or the percentage of brown-eyed persons, unemployed persons, graduate persons or persons favoring a proposal, etc.


Sampling procedure:

The same sampling procedures that are used for drawing a sample in case of quantitative characteristics can also be used for drawing a sample for qualitative characteristic. 

So, the sampling procedures remain the same irrespective of the nature of characteristic under study - either qualitative or quantitative.
For example, the SRSWOR and SRSWR procedures for drawing the samples remain the same for qualitative and quantitative characteristics. Similarly, other sampling schemes like stratified sampling, two-stage sampling etc. also remain the same.

Estimation of population proportion:

Sometimes, the units in the population are classified into two groups:

  • Having a particular characteristic
  • Not having a particular characteristic.
This exhibit a characteristic of the binomial experiment—that is, an observation either does belong or does not belong to the category of interest.
For example, a crop field may be irrigated or not irrigated. If it is irrigated, we say that it possesses the characteristic, ‘irrigation’. If it is not irrigated, we say that it does not possess the particular characteristic of irrigation.
Consider the qualitative characteristic field being irrigated or not irrigated can be divided into two mutually exclusive classes, say C and C*.
C is a part of population in which the fields are irrigated and C* be the other part of population not being irrigated.
Let A be the number of units in C and (N-A) units in C* be in a population of size N. Then the proportion of units in C is
P=A/N
and the proportion of units in C* is
Q=(N-A)/N = 1-P
An indicator variable Y can be associated with the characteristic under study and then for
 i = 1, 2, ... , N
Yi = 1; ith unit belongs to C
Yi = 0; ith unit belongs to C*
Now the population total is 
Ytotal = ∑(Yi) = A
And population mean is Y bar = A/N = P
Suppose a sample of size n is drawn from a population of size N by simple random sampling. Let a be the number of units in the sample which fall into class C and (n-a) units fall in class C*, then the sample proportion of units in C is
p = a/n = y bar





S2 = (N/N-1) PQ
Similarly,  
s2 = (n/n-1) pq
The quantities S2, s2, Y bar and y bar have been expressed as functions of sample and population proportions.







Confidence interval estimation of P:

If N and n are large then (p- P)/√Var(p) approximately follows N (0,1). With this approximation, we can write

P[-Z𝞪/2  ≤ (p-P)/√Var(p)  ≤ Z𝞪/2] = 1-𝞪   and the 100(1- 𝞪)%   confidence interval of P is

[p - Z𝞪/2√Var(p) , p + Z𝞪/2√Var(p)]


Applications of sampling proportions:

1) It is applied in SQC to find the proportion of defectives in the production of products.

2) A congressional leader investigating the merits of an 18-year-old voting age may want to estimate the proportion of the potential voters in the district between the ages of 18 and 21.

3) A marketing research group may be interested in the proportion of the total sales market in diet preparations that is attributable to a particular product. To know what percentage of sales is accounted for by a particular product we using sampling proportions.

4) A forest manager may be interested in the proportion of trees with a diameter of 12 inches or more.

5) Television ratings are often determined by estimating the proportion of the viewing public that watches a particular program.

6) In a student data set to know the proportion of students selecting a particular interview.

7) To know the percentage of men who drink alcohol for a population.

8) To know what percentage men, use cosmetics in a TV sector.


EXAMPLE: 

To find the proportion of patients with High cholesterol we use sampling proportion.

 Here we are considering a data set and analyzing it using sampling proportions.

AIM:

To perform the analysis of sampling for proportions on the drugs data set and to interpret the results.

ANALYSIS:

R code:







The results of the analysis are obtained as p=0.5

INTERPRETATION:

  • From the above results, it is clear that the estimated proportion of patients whose cholesterol is ‘High’ is p=0.5 with the standard error of p, SE(p)= 0.1624.
  • With p=0.5 means that almost 50 percent of the patient’s cholesterol is high and 50 percent of patient’s cholesterol is Normal. 
  • The 95% confidence interval for population proportion P is [0.1816,0.8184]. 
  • The 95% confidence interval for population total is [38,162], which means that the minimum number of patients whose cholesterol is High from the total population is 38 and the maximum number of patients are 162.

CONCLUSION:

  • We see that two confidence intervals have been given, but we will consider only the 95% approximate hypergeometric confidence interval. 
  • The confidence interval has been calculated by assuming that the population is approximately normally distributed. 
  • We will consider the second confidence interval only if we assume that our population is not normally distributed. 
  • Hence, we come to the conclusion that the proportion of patients with High cholesterol is p=0.5 and the proportion of patients with Normal cholesterol is q=0.5. 
  • It is estimated that half the population has High cholesterol and the other half has a Normal cholesterol.



Comments

Popular posts from this blog

Comparing the efficiency of SRSWOR and SRSWR with the help of R Programming

Selection of samples:SRSWR vs SRSWOR(2048114)

pps (probability proportional to size) Systematic Sampling