Logistic回归与最小二乘概率分类算法简述与示例

Logistic Regression & Least Square Probability Classification

1. Logistic Regression

Likelihood function, as interpreted by wikipedia:

https://en.wikipedia.org/wiki/Likelihood_function

plays one of the key roles in statistic inference, especially methods of estimating a parameter from a set of statistics. In this article, we’ll make full use of it.
Pattern recognition works on the way that learning the posterior probability p(y|x) of pattern x belonging to class y. In view of a pattern x, when the posterior probability of one of the class y achieves the maximum, we can take x for class y, i.e.

y^=argmaxy=1,…,cp(u|x)

The posterior probability can be seen as the credibility of model x belonging to class y.
In Logistic regression algorithm, we make use of linear logarithmic function to analyze the posterior probability:

q(y|x,θ)=exp(∑bj=1θ(y)jϕj(x))∑cy′=1exp(∑bj=1θ(y′)jϕj(x))

Note that the denominator is a kind of regularization term. Then the Logistic regression is defined by the following optimal problem:

maxθ∑i=1mlogq(yi|xi,θ)

We can solve it by gradient descent method:

  1. Initialize θ.
  2. Pick up a training sample (xi,yi) randomly.
  3. Update θ=(θ(1)T,…,θ(c)T)T along the direction of gradient ascent:
    θ(y)←θ(y)+ϵ∇yJi(θ),y=1,…,c
    where
    ∇yJi(θ)=−exp(θ(y)Tϕ(xi))ϕ(xi)∑cy′=1exp(θ(y′)Tϕ(xi))+{ϕ(xi)0(y=yi)(y≠yi)
  4. Go back to step 2,3 until we get a θ of suitable precision.

Take the Gaussian Kernal Model as an example:

q(y|x,θ)∝exp⎛⎝∑j=1nθjK(x,xj)⎞⎠

Aren’t you familiar with Gaussian Kernal Model? Refer to this article:

http://blog.csdn.net/philthinker/article/details/65628280

Here are the corresponding MATLAB codes:

n=90; c=3; y=ones(n/c,1)*(1:c); y=y(:);
x=randn(n/c,c)+repmat(linspace(-3,3,c),n/c,1);x=x(:);

hh=2*1^2; t0=randn(n,c);
for o=1:n*1000
    i=ceil(rand*n); yi=y(i); ki=exp(-(x-x(i)).^2/hh);
    ci=exp(ki'*t0); t=t0-0.1*(ki*ci)/(1+sum(ci));
    t(:,yi)=t(:,yi)+0.1*ki;
    if norm(t-t0)<0.000001
        break;
    end
    t0=t;
end

N=100; X=linspace(-5,5,N)';
K=exp(-(repmat(X.^2,1,n)+repmat(x.^2',N,1)-2*X*x')/hh);

figure(1); clf; hold on; axis([-5,5,-0.3,1.8]);
C=exp(K*t); C=C./repmat(sum(C,2),1,c);
plot(X,C(:,1),'b-');
plot(X,C(:,2),'r--');
plot(X,C(:,3),'g:');
plot(x(y==1),-0.1*ones(n/c,1),'bo');
plot(x(y==2),-0.2*ones(n/c,1),'rx');
plot(x(y==3),-0.1*ones(n/c,1),'gv');
legend('q(y=1|x)','q(y=2|x)','q(y=3|x)');

2. Least Square Probability Classification

In LS probability classifiers, linear parameterized model is used to express the posterior probability:

q(y|x,θ(y))=∑j=1bθ(y)jϕj(x)=θ(y)Tϕ(x),y=1,…,c

These models depends on the parameters θ(y)=(θ(y)1,…,θ(y)b)T correlated to each classes y that is diverse from the one used by Logistic classifiers. Learning those models means to minimize the following quadratic error:
Jy(θ(y))==12∫(q(y|x,θ(y))−p(y|x))2p(x)dx12∫q(y|x,θ(y))2p(x)dx−∫q(y|x,θ(y))p(y|x)p(x)dx+12∫p(y|x)2p(x)dx
where p(x) represents the probability density of training set {xi}ni=1.
By the Bayesian formula,
p(y|x)p(x)=p(x,y)=p(x|y)p(y)

Hence Jy can be reformulated as

Jy(θ(y))=12∫q(y|x,θ(y))2p(x)dx−∫q(y|x,θ(y))p(x|y)p(y)dx+12∫p(y|x)2p(x)dx

Note that the first term and second term in the equation above stand for the mathematical expectation of p(x) and p(x|y) respectively, which are often impossible to calculate directly. The last term is independent of θ and thus can be omitted.
Due to the fact that p(x|y) is the probability density of sample x belonging to class y, we are able to estimate term 1 and 2 by the following averages:
1n∑i=1nq(y|xi,θ(y))2,1ny∑i:yi=yq(y|xi,θ(y))p(y)

Next, we introduce the regularization term to get the following calculation rule:
J^y(θ(y))=12n∑i=1nq(y|xi,θ(y))2−1ny∑i:yi=yq(y|xi,θ(y))+λ2n∥θ(y)∥2

Let π(y)=(π(y)1,…,π(y)n)T and π(y)i={1(yi=y)0(yi≠y), then

J^y(θ(y))=12nθ(y)TΦTΦθ(y)−1nθ(y)TΦTπ(y)+λ2n∥θ(y)∥2
.
Therefore, it is evident that the problem above can be formulated as a convex optimization problem, and we can get the analytic solution by setting the twice order derivative to zero:

θ^(y)=(ΦTΦ+λI)−1ΦTπ(y)
.
In order not to get a negative estimation of the posterior probability, we need to add a constrain on the negative outcome:
p^(y|x)=max(0,θ^(y)Tϕ(x))∑cy′=1max(0,θ^(y′)Tϕ(x))

We also take Gaussian Kernal Models for example:

n=90; c=3; y=ones(n/c,1)*(1:c); y=y(:);
x=randn(n/c,c)+repmat(linspace(-3,3,c),n/c,1);x=x(:);

hh=2*1^2; x2=x.^2; l=0.1; N=100; X=linspace(-5,5,N)';
k=exp(-(repmat(x2,1,n)+repmat(x2',n,1)-2*x*(x'))/hh);
K=exp(-(repmat(X.^2,1,n)+repmat(x2',N,1)-2*X*(x'))/hh);
for yy=1:c
    yk=(y==yy); ky=k(:,yk);
    ty=(ky'*ky +l*eye(sum(yk)))\(ky'*yk);
    Kt(:,yy)=max(0,K(:,yk)*ty);
end
ph=Kt./repmat(sum(Kt,2),1,c);

figure(1); clf; hold on; axis([-5,5,-0.3,1.8]);
C=exp(K*t); C=C./repmat(sum(C,2),1,c);
plot(X,C(:,1),'b-');
plot(X,C(:,2),'r--');
plot(X,C(:,3),'g:');
plot(x(y==1),-0.1*ones(n/c,1),'bo');
plot(x(y==2),-0.2*ones(n/c,1),'rx');
plot(x(y==3),-0.1*ones(n/c,1),'gv');
legend('q(y=1|x)','q(y=2|x)','q(y=3|x)');

3. Summary

Logistic regression is good at dealing with sample set with small size since it works in a simple way. However, when the number of samples is large to some degree, it is better to turn to the least square probability classifiers.

时间: 2024-09-25 10:53:23

Logistic回归与最小二乘概率分类算法简述与示例的相关文章

ID3决策树与C4.5决策树分类算法简述

Let's begin with ID3 decision tree: The ID3 algorithm tries to get the most information gain when grow the decision trees. The information gain is defined as Gain(A)=I(s1,s2,-,sm)−E(A) where I is the information entropy of a given sample setting, I(s

Logistic Regression 分类算法

Logistic Regression 分类算法 Logistic Regression包含三个部分:回归,线性回归,Logistic方程 1) 回归 Logistic regression是线性回归的一种,线性回归是一种回归.那么回归是虾米呢? 回归其实就是对已知公式的未知参数进行估计.比如已知公式是y=a∗x+b,未知参数是a和b.我们现在有很多真实的(x,y)数据(训练样本),回归就是利用这些数据对a和b的取值去自动估计.估计的方法大家可以简单的理解为,在给定训练样本点和已知的公式后,对于

logistic回归算法及其matlib实现

     一般来说,回归不用在分类问题上,因为回归是连续型模型,而且受噪声影响比较大.如果非要使用回归算法,可以使用logistic回归.      logistic回归本质上是线性回归,只是在特征到结果的映射中多加入了一层函数映射,即先把特征线性求和,然后使用函数g(z)作为假设函数来预测,g(z)可以将连续值映射到0和1上.      logistic回归的假设函数如下,线性回归假设函数只是\(\theta^Tx\). \[h_\theta(x)=g(\theta^Tx)=\frac{1}{

TensorFlow训练Logistic回归

Logistic回归 在用线性模型进行回归训练时,有时需要根据这个线性模型进行分类,则要找到一个单调可微的用于分类的函数将线性回归模型的预测值关联起来.这时就要用到逻辑回归,之前看吴军博士的<数学之美>中说腾讯和谷歌广告都有使用logistics回归算法. 如下图,可以清晰看到线性回归和逻辑回归的关系,一个线性方程被逻辑方程归一化后就成了逻辑回归.. Logistic模型 对于二分类,输出y∈{0,1},假如线性回归模型为z=θTx,则要将z转成y,即y=g(z).于是最直接的方式是用单位阶跃

Netflix工程总监眼中的分类算法:深度学习优先级最低

英文原文:What are the advantages of different classification algorithms?(翻译/王玮编辑/周建丁) [编者按]针对 Quora 上的一个老问题:不同分类算法的优势是什么?Netflix 公司工程总监 Xavier Amatriain 近日给出新的解答,他根据奥卡姆剃刀原理依次推荐了逻辑回归.SVM.决策树集成和深度学习,并谈了他的不同认识.他并不推荐深度学习为通用的方法. 不同分类算法的优势是什么?例如有大量的训练数据集,上万的实例

MachineLearning之Logistic回归

一.概述 假设现在有一些数据点,我们用一条直线对这些点进行拟合(该线称为最佳拟合直线),这个拟合过程就称为回归: 利用Logistic回归进行分类的主要思想是: 根据现有数据对分类边界线建立回归公式, 以此进行分类. 这里的"回归"一词源于最佳拟合, 表示要找到最佳拟合参数集, 其背后的数学分析将在下一部分介绍. 训练分类器时的做法就是寻找最佳拟合参数, 使用的是最优化算法. 二.基于Logistic回归和Sigmoid函数的分类  单位阶跃函数也称海维赛德阶跃函数(Heaviside

机器学习之旅---logistic回归

一.logistic回归分析简介     logistic回归是研究观察结果(因变量)为二分类或多分类时,与影响因素(自变量)之间关系的一种多变量分析方法,属于概率型非线性回归.     利用logistic回归进行分类的主要思想是:根据现有数据对分类边界线建立回归公式,以此进行分类.这里"回归"是指通过最优化方法找到最佳拟合参数集,作为分类边界线的方程系数.通过分类边界线进行分类,具体说来就是将每个测试集上的特征向量乘以回归系数(即最佳拟合参数),再将结果求和,最后输入到logist

机器学习(二)--- 分类算法详解

感觉狼厂有些把机器学习和数据挖掘神话了,机器学习.数据挖掘的能力其实是有边界的.机器学习.数据挖掘永远是给大公司的业务锦上添花的东西,它可以帮助公司赚更多的钱,却不能帮助公司在与其他公司的竞争中取得领先优势,所以小公司招聘数据挖掘/机器学习不是为了装逼就是在自寻死路.可是相比Java和C++语言开发来说,机器学习/数据挖掘确实是新一些老人占的坑少一些,而且可以经常接触一些新的东西.还是赶紧再次抓住机会集中的再总结一下吧,不能再拖拖拉拉了.  其实数据挖掘的主要任务是分类.聚类.关联分析.预测.时

6、spss做logistic回归

前边我们用两篇笔记讨论了线性回归的内容.线性回归是很重要的一种回归方法,但是线性回归只适用于因变量为连续型变量的情况,那如果因变量为分类变量呢?比方说我们想预测某个病人会不会痊愈,顾客会不会购买产品,等等,这时候我们就要用到logistic回归分析了. Logistic回归主要分为三类,一种是因变量为二分类得logistic回归,这种回归叫做二项logistic回归,一种是因变量为无序多分类得logistic回归,比如倾向于选择哪种产品,这种回归叫做多项logistic回归.还有一种是因变量为有