Hypothesis Testing

Refer to R Tutorial andExercise Solution

 

Researchers retain or reject hypothesis based on measurements of observed samples. The decision is often based on a statistical mechanism called hypothesis testing.

假设检验是数理统计学中根据一定假设条件由样本推断总体的一种方法。具体作法是:根据问题的需要对所研究的总体作某种假设,记作H0;选取合适的统计量,这个统计量的选取要使得在假设H0成立时,其分布为已知;由实测的样本,计算出统计量的值,并根据预先给定的显著性水平进行检验,作出拒绝或接受假设H0的判断。常用的假设检验方法有u—检验法、t—检验法、X2检验法、F—检验法,秩和检验等。

假设检验的基本思想是小概率反证法思想。小概率思想是指小概率事件(P<0.01或P<0.05)在一次试验中基本上不会发生。反证法思想是先提出假设(检验假设H0),再用适当的统计方法确定假设成立的可能性大小,如可能性小,则认为假设不成立,若可能性大,则还不能认为假设成立。

 

Type I error 是指统计学中的一类错误,意思是本来是错误的结论却被接受了。TypeII error 是指统计学中的二类错误,也就是本来是正确的结论却被拒绝了。简而言之,就是存伪和弃真

 

零假设(The null hypothesis), 是做统计检验时的一类假设。零假设的内容一般是希望成为正确的假设或者是需要着重考虑的假设.

与零假设相对的是备择假设对立假设),即不希望看到的另一种可能。

 

Lower Tail Test of Population Mean with Known Variance (已知全局方差的全局平均值的下界检验)

The null hypothesis of the lower tail test of the population mean can be expressed as follows:

where μ0 is a hypothesized lower bound of the true population mean μ.

假设检验, 首先是假设, 现在设定零假设为全局平均值的下界为u0

Let us define the test statistic z in terms of the sample mean, the sample size and the population standard deviation σ :

Then the null hypothesis of the lower tail test is to be rejected if z ≤−zα , where zα is the 100(1 − α) percentile of the standard normal distribution.

这个过程, 我们通过一个例子来看,

Problem

Suppose the manufacturer claims that the mean lifetime of a light bulb is more than 10,000 hours. In a sample of 30 light bulbs, it was found that they only last 9,900 hours on average. Assume the population standard deviation is 120 hours. At .05 significance level(显著性), can we reject the claim by the manufacturer?

厂商声明灯泡平均寿命为10000小时, 我们现在有一组测试数据, 30个测试灯泡, 测试平均寿命为9900, 全局方差为120, 要求在显著性0.05的情况下, 厂商的声明的假设是否成立.

这儿一定要谈显著性, 因为我们要区分出是抽样误差引起还是本质差别造成的, 当违反假设的case比例超出了我们设定的显著性, 那我们就不能认为这个是小概率事件, 而只能认为是假设根本不成立. 而在显著性范围内的case, 我们可以认为是由于抽样误差或其他小概率事件导致的, 并不影响该假设的成立.

这就和参数估计要谈confidence(置信度)一样, 值越小说明要求越严格.

Solution

The null hypothesis is that μ ≥ 10000. We begin with computing the test statistic.

> xbar = 9900            # sample mean  
> mu0 = 10000            # hypothesized value  
> sigma = 120            # population standard deviation  
> n = 30                 # sample size  
> z = (xbar−mu0)/(sigma/sqrt(n))  
> z                      # test statistic 算出根据样本得出的统计值 
[1] −4.5644

We then compute the critical value at .05 significance level.

> alpha = .05  
> z.alpha = qnorm(1−alpha)  
> −z.alpha               # critical value 满足正态分布的估计值 
[1] −1.6449

Answer

The test statistic -4.5644 is less than the critical value of -1.6449. Hence, at .05 significance level, wereject the claim that mean lifetime of a light bulb is above 10,000 hours.

 

Upper Tail Test of Population Mean with Known Variance (上界检验)

The null hypothesis of the upper tail test of the population mean can be expressed as follows:

where μ0 is a hypothesized upper bound of the true population mean μ.

Let us define the test statistic z in terms of the sample mean, the sample size and the population standard deviation σ :

Then the null hypothesis of the upper tail test is to be rejected if z  zα , where zα is the 100(1 − α) percentile of thestandard normal distribution.

 

Two-Tailed Test of Population Mean with Known Variance (双界检验)

The null hypothesis of the two-tailed test of the population mean can be expressed as follows:

where μ0 is a hypothesized value of the true population mean μ.

Let us define the test statistic z in terms of the sample mean, the sample size and the population standard deviation σ :

Then the null hypothesis of the two-tailed test is to be rejected if z ≤−zα∕2 or z ≥ zα∕2 , where zα∕2 is the 100(1 − α∕2)percentile of the standard normal distribution.

 

Lower Tail Test of Population Mean with Unknown Variance

The null hypothesis of the lower tail test of the population mean can be expressed as follows:

where μ0 is a hypothesized lower bound of the true population mean μ.

Let us define the test statistic t in terms of the sample mean, the sample size and the sample standard deviation s :

Then the null hypothesis of the lower tail test is to be rejected if t ≤−tα , where tα is the 100(1 − α) percentile of theStudent t distribution with n − 1 degrees of freedom.

和已知全局方差不同就是:

1. 用sample standard deviation s来代替population standard deviation σ

2. 用Student t分布来代替正态分布

同样他也有上界和双界检验, 不笔记了

 

Lower Tail Test of Population Proportion (全局比例的下界检验)

The null hypothesis of the lower tail test about population proportion can be expressed as follows:

where p0 is a hypothesized lower bound of the true population proportion p.

Let us define the test statistic z in terms of the sample proportion and the sample size:

Then the null hypothesis of the lower tail test is to be rejected if z ≤−zα , where zα is the 100(1 − α) percentile of thestandard normal distribution.

同样他也有上界和双界检验, 不笔记了

 

Type II Error

In hypothesis testing, a type II error is due to a failure of rejecting an invalid null hypothesis. The probability of avoiding a type II error is called the power of the hypothesis test, and is denoted by the quantity 1 - β .

Type II Error in Lower Tail Test of Population Mean with Known Variance

Problem

Suppose the manufacturer claims that the mean lifetime of a light bulb is more than 10,000 hours. Assume actual mean light bulb lifetime is 9,950 hours and the population standard deviation is 120 hours. At .05 significance level, what is theprobability of having type II error for a sample size of 30 light bulb?

Solution

We begin with computing the standard deviation of the mean, sem.

> n = 30                # sample size 
> sigma = 120           # population standard deviation 
> sem = sigma/sqrt(n); sem   # standard error 
[1] 21.909

We next compute the lower bound of sample means for which the null hypothesis μ ≥ 10000 would not be rejected.

> alpha = .05           # significance level 
> mu0 = 10000           # hypothetical lower bound 
> q = qnorm(alpha, mean=mu0, sd=sem); q 
[1] 9964

Therefore, so long as the sample mean is greater than 9964 in a hypothesis test, the null hypothesis will not be rejected. Since we assume that the actual population mean is 9950, we can compute the probability of the sample mean above 9964, and thus found the probability of type II error.

> mu = 9950             # assumed actual mean 
> pnorm(q, mean=mu, sd=sem, lower.tail=FALSE) 
[1] 0.26196

Answer

If the light bulbs sample size is 30, the actual mean light bulb lifetime is 9,950 hours and the population standard deviation is 120 hours, then the probability of type II error for testing the null hypothesis μ ≥ 10000 at .05 significance level is 26.2%, and the power of the hypothesis test is 73.8%.

本文章摘自博客园,原文发布日期:2012-02-28 

时间: 2024-09-12 23:04:20

Hypothesis Testing的相关文章

《Spark 官方文档》机器学习库(MLlib)指南

机器学习库(MLlib)指南 MLlib是Spark的机器学习(ML)库.旨在简化机器学习的工程实践工作,并方便扩展到更大规模.MLlib由一些通用的学习算法和工具组成,包括分类.回归.聚类.协同过滤.降维等,同时还包括底层的优化原语和高层的管道API. MLllib目前分为两个代码包: spark.mllib 包含基于RDD的原始算法API. spark.ml 则提供了基于DataFrames 高层次的API,可以用来构建机器学习管道. 我们推荐您使用spark.ml,因为基于DataFram

每一位数据科学家都应掌握的理论是什么?

[导语]Jean-Nicholas Hould是就职于英特尔的数据科学家.针对热门议题--<检测真假数据科学家之二十问>,他认为,还有一问亦不可少,那就是:"什么是中心极限定理?它为什么重要?"你可能会心中暗喜:中心极限定理,我学过啊?可它为什么如此重要,你了然吗?善于考究的,请阅读原文,不爱折腾的,请看编译文章(80%+内容为译者独立编写) 真假数据科学家检验之二十问 早在2009年,Google首席经济学家Hal Varian就给出了一个非常著名的论断:"在未

Eclipse插件Continuous Testing介绍

   介绍     现在IT开发人员比以往任何时候都更加关注测试的重要性,没有经过良好测试的代码更容易出问题.在极限编程中,测试驱动开发已经被证明是一种有效提高软件质量的方法.在测试驱动的开发方式中,软件工程师在编写功能代码之前首先编写测试代码,这样能从最开始保证程序代码的正确性,并且能够在程序的每次演进时进行自动的回归测试.     单元测试是和开发人员最密切相关的测试类型.它通常由开发人员编写和执行.由于单元测试通常发生在错误产生之后不久,因此通过单元测试发现错误然后进行修正的代价通常比较小

testing - 测试基本使用接口

testing - 测试基本使用接口 当你写完一个函数,结构体,main之后,你下一步需要的就是测试了.testing包提供了很简单易用的测试包. 写一个基本的测试用例 测试文件的文件名需要以_test.go为结尾,测试用例需要以TestXxxx的样式存在. 比如我要测试utils包的sql.go中的函数: func GetOne(db *sql.DB, query string, args ...interface{}) (map[string][]byte, error) { 就需要创建一个

深入探讨Unit Testing in Android_Android

1. Testing for ContentProvider在你开始为Provider写Case之前,应该仔细读一读SDK文档中关于Provider测试的说明.但是光读那些说明,你还是没办法写出正确的Case,因为你也知道,Android的文档是比较差劲的,有一些关键东西文档中没有说明,你也知道,这在Android当中并不少见.你写个Provider的Case,如下: 复制代码 代码如下: public class DemoProviderTest extends ProviderTestCas

笔记:Automated Journey Testing with Cascade

感觉是一个很好的介绍Cascade的文章. Key Takeaways The problem of testing a system is becoming harder as we have larger teams, as we have more processes and as we adopt microservices architecture.(是的,微服务要背锅) The testing problem is fundamentally different moving for

NodeJS Testing: From 0 to 90

Recently, I was responsible for a NodeJS-based application. In this process, I had to learn about many new things, one of which was testing. At the start, I was not able to write much code, much less test it. At that time, my testing coverage rate wa

Caffe代码导读(5):对数据集进行Testing

上一篇介绍了如何准备数据集,做好准备之后我们先看怎样对训练好的模型进行Testing. 先用手写体识别例子,MNIST是数据集(包括训练数据和测试数据),深度学习模型采用LeNet(具体介绍见http://yann.lecun.com/exdb/lenet/),由Yann LeCun教授提出. 如果你编译好了Caffe,那么在CAFFE_ROOT下运行如下命令: $ ./build/tools/caffe.bin test -model=examples/mnist/lenet_train_te

40. Testing Prev Part IV. Spring Boot features

40. Testing Spring Boot provides a number of utilities and annotations to help when testing your application. Test support is provided by two modules; spring-boot-test contains core items, and spring-boot-test-autoconfigure supports auto-configuratio