Friday 13 January 2017

One Way ANOVA (Welch Test)

Introduction
A one-way analysis of variance is an extension of the independent group t‑test where there are more than two groups
General speaking, ANOVA can used in the same condition as two-sample t-test. when independent variable has two levels, both two-sample T test and ANOVA can be used. But when independent variable has three or more levels, only ANOVA can be used.
Image result for anova

The one-way analysis of variance (ANOVA) is used to determine whether there are any significant differences between the means of two or more independent (unrelated) groups.

For example, you could use a one-way ANOVA to understand whether exam performance differed based on test anxiety levels amongst students, dividing students into three independent groups (e.g., low, medium and high-stressed students). It is important to realize that the one-way ANOVA is an omnibus test statistic and cannot tell you which specific groups were significantly different from each other; it only tells you that at least two groups were different.

The question is to find "any difference among 3 Stress Level Students", which is also known as the global test.

After the global effect is confirmed, further test are needed to check what the differences are, i.e, "Between the 3 stress levels". The test is known as multiple comparison, which will be demonstrated in the later section on this page.

Assumptions for the Global Test
  • Your dependent variable should be measured at the interval or ratio level (i.e., they are continuous). Examples of variables that meet this criterion include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth.
  • Your independent variable should consist of two or more categorical, independent groups. Typically, a one-way ANOVA is used when you have three or more categorical, independent groups, but it can be used for just two groups (but an independent-samples t-test is more commonly used for two groups). Example independent variables that meet this criterion include ethnicity (e.g., 3 groups: Caucasian, African American and Hispanic), physical activity level (e.g., 4 groups: sedentary, low, moderate and high), profession (e.g., 5 groups: surgeon, doctor, nurse, dentist, therapist), and so forth
  • Random and independent experiment design. You should have independence of observations, which means that there is no relationship between the observations in each group or between the groups themselves. For example, there must be different participants in each group with no participant being in more than one group. This is more of a study design issue than something you can test for, but it is an important assumption of the one-way ANOVA. If your study fails this assumption, you will need to use another statistical test instead of the one-way ANOVA (e.g., a repeated measures design). 
  • There should be no significant outliers. Outliers are simply single data points within your data that do not follow the usual pattern (e.g., in a study of 100 students' IQ scores, where the mean score was 108 with only a small variation between students, one student had a score of 156, which is very unusual, and may even put her in the top 1% of IQ scores globally). The problem with outliers is that they can have a negative effect on the one-way ANOVA, reducing the validity of your results. Fortunately, when using SPSS to run a one-way ANOVA on your data, you can easily detect possible outliers
  • Samples are normally distributed. Your dependent variable should be approximately normally distributed for each category of the independent variable. We talk about the one-way ANOVA only requiring approximately normal data because it is quite "robust" to violations of normality, meaning that assumption can be a little violated and still provide valid results. You can test for normality using the Shapiro-Wilk test of normality
  • Sample have similar standard deviation (σ1 = σ23). There needs to be homogeneity of variances. You can test this assumption using Levene's test for homogeneity of variances. If your data fails this assumption, you will need to not only carry out a Welch ANOVA instead of a one-way ANOVA, but also use a different post-hoc test. 
  • Sample sizes between groups do not have to be equal, but large differences in sample sizes for the groups may affect the outcome of some multiple comparisons tests.
Here is a question; How can we check the normality
The Normality can be checked with Univariate Procedure. It is noted to mention that ANOVA is relatively robust even when data is not Normally distributed. The assumption of equal variances (homogeneity of variances) can be checked with "hovtest" option, with the sas statement, "means method /hovtest welch"
 
Null Hypotheses
The hypotheses for the comparison of independent groups are:
 
Ho: m1 = m2  ...  = mk   (means of the all groups are equal)
Ha: mi ¹ mj                    (means of the two or more groups are not equal)
where k is the number of groups
 
The test statistic reported is an F test with k‑1 and N‑k degrees of freedom, where N is the number of subjects. A low p‑value for the F-test is evidence to reject the null hypothesis. In other words, there is evidence that at least one pair of means are not equal.
 
Let us see the following example, suppose you are interested in comparing WEIGHT (gain) across the 4 levels of a GROUP variable, to determine if weight gain of individuals across groups is significantly different.
 
The following SAS code can perform the test:
 
PROC ANOVA DATA=ANOVA;
CLASS GROUP;
MODEL WEIGHT=GROUP;
TITLE 'Compare WEIGHT across GROUPS';
RUN;
 
GROUP is the "CLASS" or grouping variable (containing four levels), and WEIGHT is the continuous variable, whose means across groups are to be compared. The MODEL statement can be thought of as
 
DEPENDENT VARIABLE = INDEPENDENT VARIABLE(S);
 
where the DEPENDENT variable is the "response" variable, or one you measured, and the independent variable(s) is the observed data. The model statement generally indicated that given the information on the right side of the equal sign you can predict something about the value of the information on the left side of the equal sign. (Under the null hypothesis there is no relationship.)
 
Since the rejection of the null hypothesis does not specifically tell you which means are different, a multiple comparison test is often performed following a significant finding in the One‑Way ANOVA. To request multiple comparisons in PROC ANOVA, include a MEANS statement with a multiple comparison option. The syntax for this statement is
 
MEANS SOCIO /testname;
 
where testname is a multiple comparison test. Some of the tests available in SAS include:
 
BON               - Performs Bonferroni t-tests of differences
DUNCAN            - Duncan’s multiple range test
SCHEFFE           - Scheffe multiple comparison procedure
SNK               - Student Newman Keuls multiple range test
LSD               - Fisher’s Least Significant Difference test
TUKEY             - Tukey’s studentized range test
DUNNETT (‘x’)     - Dunnett’s test – compare to a single control
 
You may also specify
 
ALPHA = p   - selects level of significance for comparisons    (default is 0.05)
 
For example, to select the TUKEY test, you would use the statement
 
MEANS GROUP /TUKEY;
 
Graphical comparison: A graphical comparison allows you to visually see the distribution of the groups. If the p‑value is low, chances are there will be little overlap between the two or more groups. If the p‑value is not low, there will be a fair amount of overlap between all of the groups. A simple graph for this analysis can be created using the PROC PLOT or PROC GPLOT procedure.
 
For example: 
PROC GPLOT; PLOT GROUP*WEIGHT;
 
will produce a plot showing WEIGHT by group.
 
Thus, the code for the complete analysis becomes:
 
PROC ANOVA;
CLASS GROUP;
MODEL WEIGHT=GROUP;
MEANS GROUP /TUKEY;
TITLE 'Compare WEIGHT across GROUPS';
PROC GPLOT; PLOT GROUP*WEIGHT;
      RUN;
 
Following is a SAS job that performs a one-way ANOVA and produces a plot.

  

One-Way ANOVA Example

 
Suppose you are comparing the time to relief of three headache medicines -- brands 1, 2, and 3. The time to relief data is reported in minutes. For this experiment, 15 subjects were randomly placed on one of the three medicines. Which medicine (if any) is the most effective? The data for this example are as follows:
 
Brand 1     Brand 2    Brand 3
24.5        28.4        26.1
23.5        34.2        28.3
26.4        29.5        24.3
27.1        32.2        26.2
29.9        30.1        27.8
 
Notice that SAS expects the data to be entered as two variables, a group and an observation.
 
Here is the SAS code to analyze these data. (AANOVA EXAMPLE2.SAS)
 
DATA ACHE;
INPUT BRAND RELIEF;
CARDS;
1 24.5
1 23.5
1 26.4
1 27.1
1 29.9
2 28.4
2 34.2
2 29.5
2 32.2
2 30.1
3 26.1
3 28.3
3 24.3
3 26.2
3 27.8
;
ODS RTF;ODS LISTING CLOSE;
PROC ANOVA DATA=ACHE;
    CLASS BRAND;
    MODEL RELIEF=BRAND;
    MEANS BRAND/TUKEY CLDIFF;
TITLE 'COMPARE RELIEF ACROSS MEDICINES  - ANOVA EXAMPLE';
PROC GPLOT;
       PLOT RELIEF*BRAND;
PROC BOXPLOT;
    PLOT RELIEF*BRAND;
       TITLE 'ANOVA RESULTS';
RUN;
QUIT;
ODS RTF close;
ODS LISTING;
 
Following is the (partial) output for the headache relief study: 
 
ANOVA Procedure
Dependent Variable: Relief
 
 
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
Model
2
66.7720000
33.3860000
7.14
0.0091
Error
12
56.1280000
4.6773333
 
 
Corrected Total
14
122.9000000
 
 
 
 
 
 
R-Square
Coeff Var
Root MSE
RELIEF Mean
0.543303
7.751664
2.162714
27.90000
 
 
 
Source
DF
Anova SS
Mean Square
F Value
Pr > F
BRAND
2
66.77200000
33.38600000
7.14
0.0091
 
 
 
uThe initial table in this listing is the Analysis of Variance Table. The most important line to observe in this table is the “Model.” At the right of this line is the p-value for the overall ANOVA test. It is listed as “Pr > F” and is p = 0.0091. This tests the overall model to determine if there is a difference in means between BRANDS. In this case, since the p-value is small, you can conclude that there is evidence that there is a statistically significant difference in brands.
 
v Now that you know that there are differences in BRAND, you need to determine where the differences lie. In this case, that comparison is performed by the Tukey Studentized Range comparison (at the alpha = 0.05 level). See the tables below.
 
The Tukey Grouping table displays those differences. Notice the grouping labels “A” and “B” in this table.  There is only one mean associated with the “A” group, and that is brand 2. This indicates that the mean for brand 2 is significantly larger than the means of all other groups. There are two means associated with the “B” group – brands 1 and 3.  Since these two means are grouped, it tells you that they were not found to be significantly different.
 
Tukey's Studentized Range (HSD) Test for RELIEFv
 
Alpha
0.05
Error Degrees of Freedom
12
Error Mean Square
4.677333
Critical Value of Studentized Range
3.77278
Minimum Significant Difference
3.649
 
 
 
Means with the same letter are not significantly different.
Tukey Grouping
Mean
N
BRAND
A
30.880
5
2
 
 
 
 
B
26.540
5
3
B
 
 
 
B
26.280
5
1
 
                                    
 
Thus, the Tukey comparison concludes that the mean for brand 2 is significantly higher than the means of brands 1 and 3, and that there is no significant difference between brands 1 and 3. Another way to express the differences is to use the CLDIFF option with TUKEY (same results, difference presentation). For example
 
MEANS BRAND/TUKEY CLDIFF;
 
Using this option produces this versions of a comparison table:
Comparisons significant at the 0.05 level are indicated by ***.
BRAND
Comparison
Difference
Between
Means
Simultaneous 95% Confidence Limits
 
2 - 3
4.340
0.691
7.989
***
2 - 1
4.600
0.951
8.249
***
3 - 2
-4.340
-7.989
-0.691
***
3 - 1
0.260
-3.389
3.909
 
1 - 2
-4.600
-8.249
-0.951
***
1 - 3
-0.260
-3.909
3.389
 
Visual Comparisons: Two graphs of BRAND by RELIEF shows you the distribution of relief across brands, which visually confirms the ANOVA results. The first is a “dot” plot given by the PROC GPLOT command and shows each data point by group. The second plot is a box and whiskers plot created with PROC BOXPLOT. Note than Brand 2 relief results tend to be longer (higher values) than the levels for brands 1 and 3.
SAS Statistics Dot plot
SAS Statistics Box Plot
 
Hands-on exercise:
Modify the PROC ANOVA  program to perform Scheffe, LSD and Dunnett’s test using the following code and compare results.
 
      MEANS BRAND/SCHEFFE;
      MEANS BRAND/LSD;
      MEANS BRAND/DUNNETT ('1');
 
 

One-Way ANOVA using GLM

 
PROC GLM will produce essentially the same results as PROC ANOVA with the addition of a few more options. For example, your can include an OUTPUT statement and output residuals that can then be examined. (PROCGLM1.SAS)
 
ODS RTF; ODS GRAPHICS ON;
PROC GLM DATA=ACHE;
    CLASS BRAND;
    MODEL RELIEF=BRAND;
    MEANS BRAND/TUKEY CLDIFF;
    OUTPUT OUT=FITDATA P=YHAT R=RESID;
* Now plot the residuals;
 PROC GPLOT;
   plot resid*BRAND;
   plot resid*yhat;
run;
ODS RTF CLOSE;
ODS GRAPHICS OFF;
 
Notice also the statements ODS GRAPHICS ON and ODS GRAPHIS OFF. This produces better looking plots than we were able to get using PROC GPLOT in conjunction with PROC ANOVA. This produces the more detailed box and whiskers plot as show here:
 
SAS Statistics Box Plot
 
 
However, there are still a couple of other plots that might be of interest. These are requested using the code
 
PROC GPLOT;
   plot resid*BRAND;
   plot resid*yhat;
run;
 
 
The resulting plots (below) are an analysis of the residuals. The first plot residuals by brand. Typically, you want the residuals to be randomly scattered by group (which looks okay in this plot)
 
SAS Statistics Residual Plot
 
 
 
The second plot looks at residual by YHAT (the estimated RELIEF).  You can see three estimates – related to the three brands. For each estimate the residuals are randomly distributed.
 
SAS Statistics Residual Plots


References

http://www.stattutorials.com/SAS/TUTORIAL-PROC-GLM.htm

http://www.stat.purdue.edu/~tqin/system101/method/method_one_way_ANOVA_sas.htm#analyze