Mastery requires practice. Having completed Statistical Thinking I and II, you developed your probabilistic mindset and the hacker stats skills to extract actionable insights from your data. Your foundation is in place, and now it is time practice your craft. In this course, you will apply your statistical thinking skills, exploratory data analysis, parameter estimation, and hypothesis testing, to two new real-world data sets. First, you will explore data from the 2013 and 2015 FINA World Aquatics Championships, where you will quantify the relative speeds and variability among swimmers. You will then perform a statistical analysis to assess the "current controversy" of the 2013 Worlds in which swimmers claimed that a slight current in the pool was affecting result. Second, you will study the frequency and magnitudes of earthquakes around the world. Finally, you will analyze the changes in seismicity in the US state of Oklahoma after the practice of high pressure waste water injection at oil extraction sites became commonplace in the last decade. As you work with these data sets, you will take vital steps toward mastery as you cement your existing knowledge and broaden your abilities to use statistics and Python to make sense of your data.
To begin, you'll use two data sets from Caltech researchers to rehash the key points of Statistical Thinking I and II to prepare you for the following case studies!
In this chapter, you will practice your EDA, parameter estimation, and hypothesis testing skills on the results of the 2015 FINA World Swimming Championships.
Some swimmers said that they felt it was easier to swim in one direction versus another in the 2013 World Championships. Some analysts have posited that there was a swirling current in the pool. In this chapter, you'll investigate this claim! References - Quartz Media, Washington Post, SwimSwam (and also here), and Cornett, et al.
Herein, you'll use your statistical thinking skills to study the frequency and magnitudes of earthquakes. Along the way, you'll learn some basic statistical seismology, including the Gutenberg-Richter law. This exercise exposes two key ideas about data science: 1) As a data scientist, you wander into all sorts of domain specific analyses, which is very exciting. You constantly get to learn. 2) You are sometimes faced with limited data, which is also the case for many of these earthquake studies. You can still make good progress!
Of course, earthquakes have a big impact on society, and recently are connected to human activity. In this final chapter, you'll investigate the effect that increased injection of saline wastewater due to oil mining in Oklahoma has had on the seismicity of the region.
Generally to understand some characteristic of the general population we take a random sample and study the corresponding property of the sample. We then determine whether any conclusions we reach about the sample are representative of the population.
This is done by choosing an estimator function for the characteristic (of the population) we want to study and then applying this function to the sample to obtain an estimate. By using the appropriate statistical test we then determine whether this estimate is based solely on chance.
The hypothesis that the estimate is based solely on chance is called the null hypothesis. Thus, the null hypothesis is true if the observed data (in the sample) do not differ from what would be expected on the basis of chance alone. The complement of the null hypothesis is called the alternative hypothesis.
The null hypothesis is typically abbreviated as H0 and the alternative hypothesis as H1. Since the two are complementary (i.e. H0 is true if and only if H1 is false), it is sufficient to define the null hypothesis.
Since our sample usually only contains a subset of the data in the population, we cannot be absolutely certain as to whether the null hypothesis is true or not. We can merely gather information (via statistical tests) to determine whether it is likely or not. We therefore speak about rejecting or not rejecting (aka retaining) the null hypothesis on the basis of some test, but not of accepting the null hypothesis or the alternative hypothesis. Often in an experiment we are actually testing the validity of the alternative hypothesis by testing whether to reject the null hypothesis.
When performing such tests, there is some chance that we will reach the wrong conclusion. There are two types of errors:
- Type I – H0 is rejected even though it is true (false positive)
- Type II – H0 is not rejected even though it is false (false negative)
The acceptable level of a Type I error is designated by alpha (α), while the acceptable level of a Type II error is designated beta (β).
We use the following terminology:
Significance level is the acceptable level of type I error, denoted α. Typically, a significance level of α = .05 is used (although sometimes other levels such as α = .01 may be employed). This means that we are willing to tolerate up to 5% of type I errors, i.e. we are willing to accept the fact that in 1 out of every 20 samples we reject the null hypothesis even though it is true.
P-value (the probability value) is the value p of the statistic used to test the null hypothesis. If p < α then we reject the null hypothesis.
Critical region is the part of the sample space that corresponds to the rejection of the null hypothesis, i.e. the set of possible values of the test statistic which are better explained by the alternative hypothesis. The significance level is the probability that the test statistic will fall within the critical region when the null hypothesis is assumed.
Usually the critical region is depicted as a region under a curve for continuous distributions (or a portion of a bar chart for discrete distributions).
The typical approach for testing a null hypothesis is to select a statistic based on a sample of fixed size, calculate the value of the statistic for the sample and then reject the null hypothesis if and only if the statistic falls in the critical region.
One-tailed hypothesis testing specifies a direction of the statistical test. For example to test whether cloud seeding increases the average annual rainfall in an area which usually has an average annual rainfall of 20 cm, we define the null and alternative hypotheses as follows, where μ represents the average rainfall after cloud seeding.
H0: µ ≤ 20 (i.e. average rainfall does not increase after cloud seeding)
H1: µ > 20 (i.e. average rainfall increases after cloud seeding
Here the experimenters are quite sure that the cloud seeding will not significantly reduce rainfall, and so a one-tailed test is used where the critical region is as in the shaded area in Figure 1. The null hypothesis is rejected only if the test statistic falls in the critical region, i.e. the test statistic has a value larger than the critical value.
Figure 1 – Critical region is the right tail
The critical value here is the right (or upper) tail. It is quite possible to have one sided tests where the critical value is the left (or lower) tail. For example, suppose the cloud seeding is expected to decrease rainfall. Then the null hypothesis could be as follows:
H0: µ ≥ 20 (i.e. average rainfall does not decrease after cloud seeding)
H1: µ < 20 (i.e. average rain decreases after cloud seeding)
Figure 2 – Critical region is the left tail
Two-tailed hypothesis testing doesn’t specify a direction of the test. For the cloud seeding example, it is more common to use a two-tailed test. Here the null and alternative hypotheses are as follows.
H0: µ = 20
H1: µ ≠ 20
The reasons for using a two-tailed test is that even though the experimenters expect cloud seeding to increase rainfall, it is possible that the reverse occurs and, in fact, a significant decrease in rainfall results. To take care of this possibility, a two tailed test is used with the critical region consisting of both the upper and lower tails.
Figure 3 – Two-tailed hypothesis testing
In this case we reject the null hypothesis if the test statistic falls in either side of the critical region. To achieve a significance level of α, the critical region in each tail must have size α/2.
Statistical power is 1 – β. Thus power is the probability that you find an effect when one exists, i.e. the probability of correctly rejecting a false null hypothesis. While a significance level for type I error of α = .05 is typically used, generally the target for β is .20 or .10, and so .80 or .90 is used as the target value for power.
The general procedure for null hypothesis testing is as follows:
- State the null and alternative hypotheses
- Specify α and the sample size
- Select an appropriate statistical test
- Collect data (note that the previous steps should be done prior to collecting data)
- Compute the test statistic based on the sample data
- Determine the p-value associated with the statistic
- Decide whether to reject the null hypothesis by comparing the p-value to α (i.e. reject the null hypothesis if p < α)
- Report your results, including effect sizes (as described in Effect Size)
Observation: Suppose we perform a statistical test of the null hypothesis with α = .05 and obtain a p-value of p = .04, thereby rejecting the null hypothesis. This does not mean that there is a 4% probability of the null hypothesis being true, i.e. P(H0) =.04. What we have shown instead is that assuming the null hypothesis is true, the conditional probability that the sample data exhibits the obtained test statistic is 0.04; i.e. P(D|H0) =.04 where D = the event that the sample data exhibits the observed test statistic.