Which one is more serious? Type I Error or Type II Error?

My thoughts on the setup behind the hypothesis test and more.

Chia-Yun Chiang
7 min readNov 26, 2020

Do you feel more serious to put an innocent person into jail, or let a criminal go around everywhere?

Photo by Elena Mozhvilo on Unsplash

This is a classic question regarding the “two types of errors” in the hypothesis test. The two types of errors are Type I error and Type II error. A quick review of this concept:

Type I error: we reject the null hypothesis (H0) when it is true.
Type II error: we fail to reject the null hypothesis (H0) when it is not true.

Based on my observation of many examples, I would say: Type I error is more serious than Type II error given the reasoning behind the setup of the hypothesis test.

** Please note that this article is based on my observation, reading, and knowledge from previous learning, NOT the formal statistical definition. **

The Setup behind Hypothesis Test

In a hypothesis test, we have two statements — Null Hypothesis (H0) and Alternative Hypothesis (H1). We can describe these two hypotheses as below:

Null Hypothesis (H0): This statement is the default setting. We assume it is true before implementing any tests. In short, this statement has been protected.

Alternative Hypothesis (H1): This statement is the research claim, which we want to prove.

The hypothesis test set up in this way is because we want to be more strict to “what we want to prove” (alternative hypothesis H1). That is, we can not prove our research claim (H1) unless we have enough evidence.

Remember that we only have two results for a hypothesis test? We could either “reject the null hypothesis (H0)” or “fail to reject the null hypothesis (H0)”, but we never say “reject the alternative hypothesis (H1)” or “fail to reject the alternative hypothesis (H1)”. Why?

Because we want to protect our default setting (null hypothesis H0).

Only if we have enough evidence to reject the null hypothesis, we are able to say we indirectly support the alternative hypothesis (research claim).

If we don’t have evidence to reject the null hypothesis, we then keep the null hypothesis since it is protected. This does not mean the null hypothesis is true, it just indicates that our data is not strong enough to support the alternative hypothesis (research claim) at this time.

But again, if we can not reject the null hypothesis, we then keep the null hypothesis.

To sum up, we tend to protect the null hypothesis (H0) and be more strict to the alternative hypothesis (H1).

By the way, how strict is the alternative hypothesis (H1)? It depends on how we set up our significant level (α). The smaller the α is, the stricter to the alternative hypothesis.

Notes: significant level (α) indicates how many risks we would accept for type I error. For example, if I set up my significant level as 0.05, that means I accept 5% chance to get type I error.

Parachute Testing Example

To have a better understanding of the hypothesis test setup, we take parachute testing as an example.

Photo by Kamil Pietrzak on Unsplash

Let’s say I am a parachute tester. My job is to test the safety of each parachute. There are only two testing results:

  1. parachute opens
  2. parachute doesn’t open

After my testing, if the parachute doesn’t open, I will throw this parachute away; if the parachute opens, I will provide this parachute to parachuter.

Okay, let’s transform this situation into a hypothesis test.

First, I set up the null hypothesis (H0) and the alternative hypothesis (H1). Following the reasoning behind the hypothesis test, the null hypothesis is the statement I want to protect.

So, if I believe human life is much more important than anything else, I would set up my null hypothesis as “parachute will not open”.

That is, I assume all the parachute will not open (unsafe) unless I have enough data telling me that we can reject this default setting.

Null Hypothesis (H0): parachute will not open (protected)
Alternative Hypothesis (H1): parachute will open (need data to support it)

Now, we can see how our type I error and type II error looks like:

Parachute Testing — Type I Error v.s. Type II Error

From the above table, we can obviously see that type I error is much more serious than type II error as we fail to protect our protected statement (null hypothesis H0) when it is true.

What is your presumption?

So now, go back to our original question. Which one is worse? Put an innocent person into jail, or let a criminal go around everywhere?

It all depends on what is your null hypothesis, a.k.a. presumption.

Photo by Tingey Injury Law Firm on Unsplash

If the criminal justice system of country A has the principle ”innocent until proven guilty”, this means country A has the presumption that the defendant is innocent. The set up of hypothesis statements in country A are the following:

Null Hyptohesis (H0): The person is innocent (protected)
Alternative Hypothesis (H1): The person is guilty (need data to support it)

In country A, putting innocent people into jail is more serious under the presumption of their criminal justice system.

Innocent until proven guilty

However, if the criminal justice system of country B has the principle “Guilty until proven innocent”, then this is a different story. In this case, country B has the presumption that the defendant is guilty. Hence the hypothesis statements set up as follow:

Null Hyptohesis (H0): The person is guilty (protected)
Alternative Hypothesis (H1): The person is innocent (need data to support it)

In country B, letting a criminal go around everywhere is more serious under the presumption of their criminal justice system.

Guilty until proven innocent

What‘s more serious?

If we set up the null hypothesis as a protected statement, then I would say type I error is more serious than type II error as we already have our presumption, which we believe is true. Once we incorrectly reject our presumption, we fail to protect the statement which we recognize as more important than others.

What’s more serious?

We need to be careful about our bias while setting the null hypothesis and significant level — these are subjective work. How do we set up our default belief (or action)? (Or, even not aware of our default setting) How many risks are we able to accept for type I error? — these all include bias.

One of my favorite drama — 99.9 Criminal Lawyer describe the situation that Japan has more than 99% conviction rate for criminal cases. In other words, once you’ve been prosecuted, it is hard to prove you are innocent.

Photo is from TBS website

Although the criminal justice system in Japan has the principle of presumption of innocence, its high conviction rate still raise lots of discussion[1][2][3].

Japan’s government said that the high conviction rate (>99%) is related to their low indictment rate (37%) — their prosecutors only indict a person when they have high confidence; while others argue that the high conviction rate is due to their judges ignore the presumption of innocence — their judges have bias about defendants, and they are more likely to presume the defendant is guilty before having any evidence.

The same situation happens everywhere. How likely police assume a person as a theft only because of their skin color? How likely we to judge a person at the first glance by their gender?

How likely we decide to [anything…] because of [age, religion, ethnic, gender, nationality, disability, previous experience, job category, socioeconomic status, hierarchy…any reasoning]? — these are all our bias (presumption), which is inevitable, but, is also what we can try to be aware of.

Last but not least…

In this article, I organize my thoughts on two types of errors by figuring out the reasoning behind the hypothesis test. However, I am not sure if this reasoning is acceptable from a classical statistician perspective.

Also, even though I state the type I error is more serious than type II error under the set up of the null hypothesis, in most of the cases, type I error might not be extremely worse (like parachute testing example) than type II error.

Another way to apply type I error and type II error into real-world scenarios is that we can quantify the cost of type I error and type II error, set a threshold of the acceptable total cost, then we will know what is the maximum acceptable range for both types of errors. This helps us understand the cost of errors and make further decisions.

--

--