Search Articles

Mad Scientist (Statistics)

F-test

The F-test is a type of significance test which compares the standard deviations of two samples in order to determine if the standard deviations of their parent populations are equal.

We give the standard deviations of the populations the symbols σ1 and σ2.

We give the standard deviations of the samples the symbols s1 and s2.

The hypothesis in an F-test are:
H0: σ1 = σ2
H
a: σ1σ2

The test statistic (F) is given by:

The F-statistic has a unique distribution with two separate parameters for its degrees of freedom. This is because both of the sample standard deviations has have their own degrees of freedom value.

The first parameter is the degrees of freedom in the denominator(df1), which = (the number of observations in the sample with the larger s) – 1.
The second parameter is the degrees of freedom in the numerator(df2), which = (the number of observations in the sample with the smaller s) – 1.

To find at what level your data is significant you can look at a table of F distribution critical values. (some here) Find the highest significance level(α) for which your F-statistic is larger than the F critical value, then double that α (because Ha is two-sided.)
This is super-inconvenient though, so if possible use software to find the P-value exactly.

Wednesday, November 28, 2007

Two-sample t-procedures: Comparing Means

Two-Sample Inference
We can use statistical inference to draw conclusions about a population by looking at a sample of that population. In that same way, we can use statistical inference to draw conclusions about the difference between populations by looking at the difference between samples of those populations.

Comparing Means
If the means of our two populations are µ1 and µ2, then the difference between them is(obviously) µ1 - µ2. We can make confidence intervals and perform significance tests for µ1 - µ2 using the same t-procedures as with the mean of a single sample, however the standard deviation and degrees of freedom used in two-sample procedures are somewhat different from those in one-sample procedures.

Conditions
Two-sample t-procedures require that our two samples are SRS's from two independent populations. This means, for example, that the samples can't be from before and after/matched pairs type experiments.

Preparation
Just as the means from the two samples are labeled µ1 and µ2, the other symbols used should also be numbered to distinguish them.

The variables in the samples are called x1 and x2.
The means of each sample are called x1 and x2.
The standard deviations of the populations are called σ1 and σ2.
The standard deviations of the samples are called s1 and s2.
And the number of observations in the samples are called n1 and n2.

The standard deviation for x1 - x2 is given by the formula:

Thus the standard error(SE) is:


Degrees Of Freedom
The degrees of freedom for two-sample t-procedures are different than for one-sample t-procedures. Software will calculate the degrees of freedom automatically, but if you don't have access to such software simply use the smaller of n1-1 and n2-1. Using the smaller of n1 - 1 and n2 - 1 is a conservative estimate of the true degrees of freedom, so while it is inaccurate, it errs on the side of caution.

Confidence Interval
The confidence interval for µ1 - µ2 is given by:


Significance Test
The null hypothesis for comparing two means is usually that there is no difference between them, that is:
H0: µ1 = µ2
The alternative hypothesis can be one-sided or two-sided.
The overall procedure is the same as for a single sample, however now were using x1 - x2 instead of x and µ1 - µ2 instead of µ.

Standardising x1 - x2 gives the two-sample t-statistic:

And if H0: µ1 = µ2 then µ1 - µ2 = 0 so the statistic is simply:

From this point on we simply follow the same routine as with the one-sample test (but don't forget the different degrees of freedom.)

Robustness
Two-sample t-procedures are more robust than one-sample t-procedures. They are more robust for larger sample sizes, and more similar sample sizes.
As a rough guide for when it's safe to use these procedures, use the guidelines for one-sample t-procedures, but replace each "n" with "n1 + n2".

Tuesday, November 27, 2007

t-procedures

t distributions are similar to normal distributions, however they have a greater spread, with more area in the tails and less in the centre. This difference in shape is due to the fact that substituting s for σ in the formula for the standard deviation of x introduces more variability into the possible values for x.
A t(k) distribution becomes more and more similar to a normal distribution as k increases. For this reason, z can be used for very large samples even when σ is unknown.

Robustness of t-procedures
t-procedures are more robust for samples with more observations.
They are not robust against outliers, as outliers have the potential to greatly influence x.
They are somewhat robust against skewedness.

The general guidelines for one-sample t-procedure robustness are:
if n < 15 use the t-procedures only if the sample's distribution is very close to normal.
if n > 15 use the t-procedures if there are no outliers and the sample's distribution is not strongly skewed
if n > 40 the t-procedures can be used even for clearly skewed data as long as there are no extremely influential outliers.

Friday, November 23, 2007

Matched Pairs Inference

One-sample t-procedures can be used for inference on matched pairs experiments.
The one-sample t-procedures are used on the differences between the subjects of the pairs. This effectively turns the matched pairs data into a single sample.

Two-sample t-procedures cannot be used for inference on matched pairs experiments because the samples are not independent.

Errors in Significance Tests

Type 1 Error
A type 1 error is when we wrongly reject a true H0. The probability of this occurring is equal to α.

Type 2 Error
A type 2 error is when we wrongly accept an H0 when a specific Ha is true. The probability if this occurring is equal to 1 minus the power of the test against that Ha.

Power of a Test

The power of a significance test against a particular Ha is the probability of it rejecting H0 when that Ha is true. To find this probability we do the following: (for a one-sample t)

1.
Using the normal steps of a significance test find out which values of t will lead us to reject H0.
e.g. reject H0 when t > 3.

2.
Rearrange the one-sample test statistic to work out what values x needs to take for us to reject H0.
e.g. if t > 3 then:

3.
Find the probability of x taking such values if Ha is true. To do this we standardise using µa.
e.g. continuing from above:



Then we are simply looking at areas under a normal curve, which we can calculate easily.

Cautions For Confidence Intervals and Significance Tests

-When using data from a sample to perform a significance test or create a confidence interval, the data must be from an SRS, or from a sample chosen close enough to randomly that we can regard it as an SRS.

-x is strongly influenced by outliers. The sample should be examined beforehand to determine if any outliers are present, and if so, if they can be corrected or removed.

-Small samples must have roughly normal distributions for the sampling distribution of x to be normal, so confidence intervals and significance tests wont be accurate for small non-normal samples.

-Multiple analysis greatly increase the chance of an error. For example, if we create 20 90% confidence intervals for a parameter from 20 samples, the probability is quite high that at least one of our confidence intervals will fail to capture that parameter.

-The margin of error in confidence intervals and the level of significance of significance tests do not take into account any errors in the data itself. (e.g. bias, undercoverage, nonresponse etc.)

Significance Tests for Population Means

Significance tests are a type of statistical inference where information from a sample of a population is used to assess the validity of a claim made about that population. Significance tests look at the properties of the sample, and work out the probability of obtaining such a sample if a given claim about the population is true. This probability is called the "P-value" of the test.

Significance tests for a population mean(µ) are performed according to the following procedure:


1. Null Hypothesis
We state the claim we are testing. This claim is called the null hypothesis and is represented by H0.
e.g. I believe that on average I stay awake for the first 5 minutes of my lectures (µ = 5). This null hypothesis is written as:
H0: µ = 5

2. Alternative Hypothesis
We state an alternative hypothesis (Ha) which we suspect is the case.
e.g. Ha: µ ≠ 5
That is, I don't stay awake for 5 minutes on average.
Hypothesis can be two-sided as in the example above, or one-sided
e.g. Ha: µ > 5
That is, I stay awake for more than five minutes on average.
Or Ha: µ < 5 (I stay awake for more than five minutes on average.)

3. Sample
Obtain a sample of size n from a population. Calculate the mean of the sample (x) and its standard deviation (s)

4. Standardise x
If we know σ
We know that the sampling distribution of x has mean µ, and if we assume H0 is true we know that µ = 5, so we have enough information to standardise x. This tells us how far x is from µ in units of standard deviations assuming H0 is true. We can then determine from a table of standard normal probabilities, the probability of obtaining such an x if H0 is true. If the probability is very low, our assumption that H0 is true is probably wrong.

To standardise x we use the formula:

where µ0 is the mean assuming H0 is true.
z is called the "one-sample z statistic".

If we don't know σ
Usually we don't know σ, so we must instead use s as an estimate. When we standardise x using s we get a t-distribution with n-1 degrees of freedom, just as with confidence intervals. The probability of obtaining such a t-value can be found using software or a table or compared to a table of t critical values.

To standardise x here we use the formula:

t is called the "one-sample t statistic"

The one-sample z and t statistics fall under the general category "test statistics."
I'll ignore z for the rest of this post as in reality we never know σ.

5. P-Value
The probability of getting a result at least as far from µ as t assuming H0 is true is called the P-value, or just P.
P is also the probability that H0 is true.
P can be calculated exactly using software, or more roughly using a table of t critical values.
To find p using tables of t critical values we find |t| (the absolute value of t) and compare it to the t critical values (the t values for certain probabilities) in the row df = n - 1. If n - 1 is different from the dfs in the table, take the next df below n-1.
If Ha is two-sided we now double P.
e.g. if 0.01<P<0.02 when Ha is one sided, then 0.02<P<0.04 when Ha is two-sided.

Table of t critical values (click to enlarge)

(The P-values are the numbers at the top of each column)

The results of significance tests can be described as significant or not significant at certain levels. The result of a test is significant at level α if P < α.
e.g. the results of a test is significant at level 0.05 if P < 0.05
The smaller P is, the more significant the results, and the more evidence we have against H0.

As a rough guide for when it's safe to use this t-procedure, use the guidelines for one-sample t-procedures

Saturday, November 17, 2007