Click or drag to resize

UnivariateOneWayAnovaTest Method (IReadOnlyCollectionDouble)

Performs a one-way analysis of variance (ANOVA).

Namespace:  Meta.Numerics.Statistics
Assembly:  Meta.Numerics (in Meta.Numerics.dll) Version: 4.1.4
Syntax
public static OneWayAnovaResult OneWayAnovaTest(
	params IReadOnlyCollection<double>[] samples
)

Parameters

samples
Type: System.Collections.GenericIReadOnlyCollectionDouble
The samples to compare.

Return Value

Type: OneWayAnovaResult
ANOVA data, including an F-test comparing the between-group variance to the within-group variance.
Exceptions
ExceptionCondition
ArgumentNullExceptionsamples is , or one of the samples in it is .
ArgumentExceptionsamples contains fewer than two samples.
Remarks

The one-way ANOVA is an extension of the Student t-test (StudentTTest(IReadOnlyCollectionDouble, IReadOnlyCollectionDouble)) to more than two groups. The test's null hypothesis is that all the groups' data are drawn from the same distribution. If the null hypothesis is rejected, it indicates that at least one of the groups differs significantly from the others.

Given more than two groups, you should use an ANOVA to test for differences in the means of the groups rather than perform multiple t-tests. The reason is that each t-test incurs a small risk of a false positive, so multiple t-tests increase the total risk of a false positive. For example, given a 95% confidence requirement, there is only a 5% chance that an individual t-test will incorrectly diagnose a significant difference. But given 5 samples, there are 5 * 4 / 2 = 10 t-tests to be performed, giving about a 40% chance that at least one of them will incorrectly diagnose a significant difference! The ANOVA avoids the accumulation of risk by performing a single test at the required confidence level to test for any significant differences between the groups.

A one-way ANOVA performed on just two samples is equivalent to a t-test (StudentTTest(Sample, Sample)).

ANOVA is an acronym for "Analysis of Variance". Do not be confused by the name and by the use of a ratio-of-variances test statistic: an ANOVA is primarily (although not exclusively) sensitive to changes in the mean between samples. The variances being compared by the test are not the variances of the individual samples; instead the test compares the variance of all samples considered together as one single, large sample to the variances of the samples considered individually. If the means of some groups differ significantly, then the variance of the unified sample will be much larger than the variances of the individual samples, and the test will signal a significant difference. Thus the test uses variance as a tool to detect shifts in mean, not because we are interested in the individual sample variances per se.

ANOVA is most appropriate when the sample data are continuous and approximately normal, and the samples are distinguished by a nominal variable. For example, given a random sampling of the heights of members of five different political parties, a one-way ANOVA would be an appropriate test of the whether the different parties tend to attract people with different heights.

Given a continuous independent variable, binning in order to define groups and perform an ANOVA is generally not appropriate. For example, given the incomes and heights of a large number of people, dividing these people into low-height, medium-height, and high-height groups and performing an ANOVA of the income of people in each group is not a good way to test whether height influences income, first because the result will be sensitive to the arbitrary boundaries you have chosen for the bins, and second because the ANOVA has no notion of bin ordering. In a case like this, it would be better to put the data into a BivariateSample and perform a test of association, such as a PearsonRTest, SpearmanRhoTest, or KendallTauTest between the two variables. If you have measurements of additional variables for each individual, a LinearRegression(Int32) analysis would allow you to adjust for the confounding effects of the other variables. If you define arbitrary bins of continuously variable data in order to form groups, then your ANOVA results will depend on your choice of bins.

See Also