Let’s look back at the data from the sorting experiment presented in the Numerical Data section:
Run | Bubble | Quick | Selection | Insertion | Merge |
1 | 17384 | 24 | 3258 | 3 | 30 |
2 | 17559 | 21 | 3386 | 3 | 27 |
3 | 17795 | 19 | 3344 | 4 | 28 |
4 | 17484 | 20 | 3417 | 3 | 28 |
5 | 17642 | 19 | 3358 | 3 | 30 |
Average | 17572.8 | 20.6 | 3352.6 | 3.2 | 28.6 |
In the example, it is quite clear that bubble sort is much slower than quicksort, but is quick sort faster than merge sort? This is not as clear. The average values show that quick sort is faster, but the difference is quite small.
Statistical Test
To be absolutely sure we must make a statistical test. When comparing two average values the most common test is a T-test. If we do a T-test between the execution times of quicksort and merge sort we get a P-value of 0.001. If the P-value is lower than 0.05 the difference is statistically significant. Since this is true in our example, we can safely say that quicksort is faster than merge sort!
If we want to compare three or more average values we must use another test called ANOVA. Note that both the T-test and ANOVA requires that your data is normally distributed (approximately follows the normal distribution). If it is not normally distributed, the Wilcoxon tests must be used for comparing two average values and the Kruskal-Wallis or Friedman test for three or more average values. An example of data that is typically not normally distributed is Likert and rating scales used in questionnaires. Ask your supervisor if you are unsure which test to use for your data.
All tests will output a P-value. If the P-value, as said before, is lower than or equal to 0.05 the difference is statistically significant and we can say that there is a difference between the average values. If you have three or more averages, the test will also tell you which pairwise differences that are statistically significant and which are not. If you, for example, have average execution times of three algorithms A, B and C the difference between A and B can be statistically significant but not the difference between B and C.
If the P-value is above 0.05 the difference is so small that we cannot rule out that it might be caused by chance. In this case, we simply say that there is no difference between the average values.
How to conduct a test?
EZ Statistics is a web service that supports most of the common statistical tests and is easy to use.
The t-test can also be made in Excel. You can read a guide about it here.