Statistics resolves around the study of datasets - population and sample
A population is any complete group with at least one characteristic in common. Populations are not just people. Populations may consist of, but are not limited to, people, animals, businesses, buildings, motor vehicles, farms, objects or events. The population needs to be clearly identified at the beginning of a study. The study should be based on a clear understanding of who or what is of interest, as well as the type of information required from that population
A population is a group of phenomena that have something in common. The term often refers to a group of people, as in the following examples:
- All registered voters in Crawford County
- All members of the International Machinists Union
- All Americans who played golf at least once in the past year
- All widgets produced last Tuesday by the Acme Widget Company
- All daily maximum temperatures in July for major U.S. cities
- All basal ganglia cells from a particular rhesus monkey
A parameter is a characteristic of a population. A statistic is a characteristic of a sample. Inferential statistics enables you to make an educated guess about a population parameter based on a statistic computed from a sample randomly drawn from that population (see Figure 1)
For example, say you want to know the mean income of the subscribers to a particular magazine—a parameter of a population. You draw a random sample of 100 subscribers and determine that their mean income is $27,500 (a statistic). You conclude that the population mean income μ is likely to be close to $27,500 as well. This example is one of statistical inference.
Different symbols are used to denote statistics and parameters, as Table 1 shows.
Sample Statistic | Population Parameter | |
---|---|---|
Mean | μ | |
Standard deviation | s | sigma |
Variance | s2 | sigma2 |
Probability Sampling Techniques
Probability sampling is a sampling technique where the samples are gathered in a process that gives all the individuals in the population equal chances of being selected.
Simple Random Sample
The simple random sample is the basic sampling method assumed in statistical methods and computations. To collect a simple random sample, each unit of the target population is assigned a number. A set of random numbers is then generated and the units having those numbers are included in the sample.
Random sampling with Replacement
Sampling is called with replacement when a unit selected at random from the population is returned to the population and then a second element is selected at random. Whenever a unit is selected, the population contains all the same units. A unit may be selected more than once. There is no change at all in the size of the population at any stage. We can assume that a sample of any size can be selected from the given population of any size
The number of samples is given by N power n
proc surveyselect data = hsb25 method = SRS rep = 1
sampsize = 10 seed = 12345 out = hsbs1;
id _all_; <includes all columns in the random sample>
run;
where,
SRS - simple random sampling
sampsize - size of random sample
Random Sampling without Replacement
Sampling is called without replacement when a unit is selected at random from the population and it is not returned to the main lot. First unit is selected out of a population of size N and the second unit is selected out of the remaining population of N-1 units and so on. Thus the size of the population goes on decreasing as the sample size n increases. The sample size n cannot exceed the population size N. The unit once selected for a sample cannot be repeated in the same sample. Thus all the units of the sample are distinct from one another. A sample without replacement can be selected either by using the idea of permutations or combinations.
proc surveyselect data = hsb25 method = URS rep = 1
sampsize = 10 seed = 12345 out = hsbs1;
id col1 col2 col3 ...col x; <includes only selected columns in the sample>
run;
where,
URS - Unrestricted random sampling
Systematic Sample
In a systematic sample, the elements of the population are put into a list and then every kth element in the list is chosen (systematically) for inclusion in the sample
Stratified Sample
A stratified sample is a sampling technique in which the researcher divided the entire target population into different subgroups, or strata, and then randomly selects the final subjects proportionally from the different strata
References:
http://www.cliffsnotes.com/study_guide/Populations-Samples-Parameters-and-Statistics.topicArticleId-267532,articleId-267478.html