常用无监督降维方法简述

Unsupervised Dimension Reduction

Data with high dimension is always difficult to tackle. One hand is that it requires tremendous computation resource. On the other hand, it is not so objective as the one with low dimension. Therefore, dimension reduction is one of the key tricks to tackle it.

Linear Dimension Reduction

In order to reduce the dimension of samples, such as transform {xi}ni=1 into {zi}ni=1 with little lose of information, we can use linear transformation :

zi=Txi

Before doing that, it is necessary to make sure the average of training set {xi}ni=1 to be zero, i.e. centralization. So if it were not true, we should move the frame :
xi←xi−1n∑i′=1nxi′

Principal Component Analysis, PCA

PCA, as you can see in the following contents, is the simplest linear dimension reduction method. Suppose that zi is the orthogonal projection of xi. Then we require that TTT=Im. By the same way in LS methods we try to reduce the lose of information as little as possible, i.e. we try to minimize:

∑i=1n∥TTTxi−xi∥2=−tr(TCTT)+tr(C)

where C is the covariance of training set:

C=∑i=1nxixTi

In summary, PCA is defined as

maxT∈Rm×dtr(TCTT)s.t.TTT=Im

Consider the eigenvalues of C:

Cξ=λξ

Define the eigenvalues and corresponded eigen vectors as λ1≥λ2≥⋯≥λd≥0 and ξ1,…,ξd respectively. Then we get :
T=(ξ1,…,ξm)T

Here is a simple example:

n=100;
%x=[2*randn(n,1) randn(n,1)];
x=[2*randn(n,1) 2*round(rand(n,1))-1+randn(n,1)/3];
x=x-repmat(mean(x),[n,1]);
[t,v]=eigs(x'*x,1);

figure(1); clf; hold on; axis([-6 6 -6 6]);
plot(x(:,1),x(:,2),'rx');
plot(9*[-t(1) t(1)], 9*[-t(2) t(2)]);

Locality Preserving Projections

In PCA, the structure of clusters in origin training set may be changed, which is not true in locality preserving projections. It is another version of linear dimension reduction.
Define the similarity between xi and xi′ as Wi,i′≥0. When they are similar to large degree Wi,i′ is of a large value and vice versa. Since similarity is symmetric, we require Wi,i′=Wi′,i . There are several normal forms of similarity, such as the Gaussian Similarity:

Wi,i′=exp(−∥xi−xi′∥22t2)

where t>0 is a tunable parameter.
For the purpose of holding the structure of clusters, it is necessary to hypothesis that similar xi would be transformed to similar zi. That is to say, we ought to minimize:

12∑i,i′=1nWi,i′∥Txi−Txi′∥2

However, to avoid the solution T=0, we require

TXDXTTT=Im

where X=(x1,⋯,xn)∈Rd×n, D is a diagonal matrix:

Di,i′=⎧⎩⎨⎪⎪⎪⎪∑i′′=1nWi,i′′0(i=i′)(i≠i′)

If we set L=D−W, then we can represent our optimization goal as

minT∈Rm×dtr(TXLXTTT)s.t.TXDXTTT=Im

So how to solve it? Consider the method we use in PCA:

XLXTξ=λXDXTξ

Then define the generalized eigenvalues and eigen vectors as λ1≥λ2≥⋯λd≥0 and ξ1,…,ξd respectively. Therefore

T=(ξd,ξd−1,…,ξd−m+1)T
.

n=100;
%x=[2*randn(n,1) randn(n,1)];
x=[2*randn(n,1) 2*round(rand(n,1))-1+randn(n,1)/3];
x=x-repmat(mean(x),[n,1]);
x2=sum(x.^2,2);
W=exp(-(repmat(x2,1,n)+repmat(x2',n,1)-2*x*x'));
D=diag(sum(W,2)); L=D-W;
z=x'*D*x;
z=(z+z')/2;
[t,v]=eigs(x'*L*x,z,1,'sm');

figure(1); clf; hold on; axis([-6 6 -6 6]);
plot(x(:,1),x(:,2),'rx');
plot(9*[-t(1) t(1)], 9*[-t(2) t(2)]);

Kernalized PCA

Let us turn to methods of nonlinear dimension reduction. Due to the time limit, we may not analyze it as deep as the linear one.
When it comes to nonlinearity, kernal functions are sure to be highlighted. Take the Gaussian Kernal function for example:

K(x,x′)=exp(−∥x−x′∥22h2)

Here we will not take the eigenvalues of C into account as we did in PCA, but the eigenvalues of kernal matrix Kα=λα, where the (i,i′)th element is K(xi,xi′). Hence K∈Rn×n . Note that dimension of the kernal matrix K depends only on the number of samples.
However, centralization is necessary:

K←HKH

where

H=In−1n×n/n

1n×n is a matrix with all the elements to be one. The final outcome of kernalized PCA is:

(z1,….zn)=(1λ1−−√α1,⋯,1λm−−−√αm)THKH

where α1,…,αm are m eigen vectors corresponded with m largest eigenvalues of HKH.

时间： 2024-12-22 15:50:48

常用无监督降维方法简述

Unsupervised Dimension Reduction

Linear Dimension Reduction

Principal Component Analysis, PCA

Locality Preserving Projections

Kernalized PCA

常用无监督降维方法简述的相关文章

2017上半年无监督特征学习研究成果汇总

PaperWeekly 第十七期 --- 无监督/半监督 NER

这七种数据分析领域中最为人称道的降维方法

上海科技大学屠可伟团队：小谈无监督依存句法解析

Facebook最新对抗学习研究：无需平行语料库完成无监督机器翻译

OpenAI"巧妙"发现无监督情感神经元，可利用文本检测用户情感

【Science】无监督式机器翻译，不需要人类干预和平行文本

新手常用的网站推广方法

在金山wps工具栏加入常用图片的简单方法