scale在计算时, 主要是分母的选择和center , scale配置有关. 默认两个都是TRUE.
scale(x, center = TRUE, scale = TRUE)
scale(x,scale=F,center=T) 计算结果等价于 x-mean(x)
scale(x,scale=T,center=F) 计算结果等价于 x/sqrt(sum(x^2)/(length(x)-1))
scale(x,scale=T,center=T) 计算结果等价于 (x-mean(x))/sd(x) sd计算标准差.
计算时, 都是基于列的. 例如
mean() 每列的有意义值的平均值.
例子 :
> x
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 14 9 10 16 10 9 8
[2,] 3 10 12 7 12 14 15
[3,] 10 8 8 13 11 5 16
[4,] 8 9 17 14 7 4 9
[5,] 7 7 10 13 9 6 8
> scale(x,scale=F,center=T)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 5.6 0.4 -1.4 3.4 0.2 1.4 -3.2 # 每个元素的值等于它的值减去所在列的平均值.
[2,] -5.4 1.4 0.6 -5.6 2.2 6.4 3.8
[3,] 1.6 -0.6 -3.4 0.4 1.2 -2.6 4.8
[4,] -0.4 0.4 5.6 1.4 -2.8 -3.6 -2.2
[5,] -1.4 -1.6 -1.4 0.4 -0.8 -1.6 -3.2
attr(,"scaled:center")
[1] 8.4 8.6 11.4 12.6 9.8 7.6 11.2
center表示每列的平均值, 如第一列如下 :
> x[,1]
[1] 14 3 10 8 7
> mean(x[,1])
[1] 8.4
> scale(x,scale=T,center=F)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1.3695248 0.9295160 0.7575540 1.1047627 0.8989331 0.9566892 0.6091096
[2,] 0.2934696 1.0327956 0.9090648 0.4833337 1.0787198 1.4881832 1.1420805
[3,] 0.9782320 0.8262364 0.6060432 0.8976197 0.9888265 0.5314940 1.2182192
[4,] 0.7825856 0.9295160 1.2878418 0.9666674 0.6292532 0.4251952 0.6852483
[5,] 0.6847624 0.7229569 0.7575540 0.8976197 0.8090398 0.6377928 0.6091096
attr(,"scaled:scale")
[1] 10.222524 9.682458 13.200379 14.482748 11.124298 9.407444 13.133926
scale(x,scale=T,center=F) 分母为对应列的sqrt(sum(x^2)/(length(x)-1))
例如 :
> sqrt(sum(x[,1]^2)/(length(x[,1])-1))
[1] 10.22252
> sqrt(sum(x[,2]^2)/(length(x[,2])-1))
[1] 9.682458
最后
> scale(x,scale=T,center=T)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1.38705673 0.3508232 -0.4075558 1.0114390 0.1039750 0.3467642 # 元素计算方法: (x-mean(x))/sd(x)
[2,] -1.33751899 1.2278812 0.1746668 -1.6658995 1.1437255 1.5852077 # (值-所在列数学期望)/所在列标准差
[3,] 0.39630192 -0.5262348 -0.9897783 0.1189928 0.6238503 -0.6439906
[4,] -0.09907548 0.3508232 1.6302230 0.4164749 -1.4556507 -0.8916793
[5,] -0.34676418 -1.4032928 -0.4075558 0.1189928 -0.4159002 -0.3963019
[,7]
[1,] -0.8076071
[2,] 0.9590335
[3,] 1.2114107
[4,] -0.5552299
[5,] -0.8076071
attr(,"scaled:center")
[1] 8.4 8.6 11.4 12.6 9.8 7.6 11.2
同时输出每列的数学期望.
attr(,"scaled:scale")
[1] 4.037326 1.140175 3.435113 3.361547 1.923538 4.037326 3.962323
分母为每列的标准差
[参考]
1. scale.default
> scale.default
function (x, center = TRUE, scale = TRUE)
{
x <- as.matrix(x)
nc <- ncol(x)
if (is.logical(center)) {
if (center) {
center <- colMeans(x, na.rm = TRUE)
x <- sweep(x, 2L, center, check.margin = FALSE)
}
}
else if (is.numeric(center) && (length(center) == nc))
x <- sweep(x, 2L, center, check.margin = FALSE)
else stop("length of 'center' must equal the number of columns of 'x'")
if (is.logical(scale)) {
if (scale) {
f <- function(v) {
v <- v[!is.na(v)]
sqrt(sum(v^2)/max(1, length(v) - 1L))
}
scale <- apply(x, 2L, f)
x <- sweep(x, 2L, scale, "/", check.margin = FALSE)
}
}
else if (is.numeric(scale) && length(scale) == nc)
x <- sweep(x, 2L, scale, "/", check.margin = FALSE)
else stop("length of 'scale' must equal the number of columns of 'x'")
if (is.numeric(center))
attr(x, "scaled:center") <- center
if (is.numeric(scale))
attr(x, "scaled:scale") <- scale
x
}
2. help(scale)
scale package:base R Documentation
Scaling and Centering of Matrix-like Objects
Description:
‘scale’ is generic function whose default method centers and/or
scales the columns of a numeric matrix.
Usage:
scale(x, center = TRUE, scale = TRUE)
Arguments:
x: a numeric matrix(like object).
center: either a logical value or a numeric vector of length equal to
the number of columns of ‘x’.
scale: either a logical value or a numeric vector of length equal to
the number of columns of ‘x’.
Details:
The value of ‘center’ determines how column centering is
performed. If ‘center’ is a numeric vector with length equal to
the number of columns of ‘x’, then each column of ‘x’ has the
corresponding value from ‘center’ subtracted from it. If ‘center’
is ‘TRUE’ then centering is done by subtracting the column means
(omitting ‘NA’s) of ‘x’ from their corresponding columns, and if
‘center’ is ‘FALSE’, no centering is done.
The value of ‘scale’ determines how column scaling is performed
(after centering). If ‘scale’ is a numeric vector with length
equal to the number of columns of ‘x’, then each column of ‘x’ is
divided by the corresponding value from ‘scale’. If ‘scale’ is
‘TRUE’ then scaling is done by dividing the (centered) columns of
‘x’ by their standard deviations if ‘center’ is ‘TRUE’, and the
root mean square otherwise. If ‘scale’ is ‘FALSE’, no scaling is
done.
The root-mean-square for a (possibly centered) column is defined
as sqrt(sum(x^2)/(n-1)), where x is a vector of the non-missing
values and n is the number of non-missing values. In the case
‘center = TRUE’, this is the same as the standard deviation, but
in general it is not. (To scale by the standard deviations
without centering, use ‘scale(x, center = FALSE, scale = apply(x,
2, sd, na.rm = TRUE))’.)
Value:
For ‘scale.default’, the centered, scaled matrix. The numeric
centering and scalings used (if any) are returned as attributes
‘"scaled:center"’ and ‘"scaled:scale"’
References:
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
Language_. Wadsworth & Brooks/Cole.
See Also:
‘sweep’ which allows centering (and scaling) with arbitrary
statistics.
For working with the scale of a plot, see ‘par’.
Examples:
require(stats)
x <- matrix(1:10, ncol = 2)
(centered.x <- scale(x, scale = FALSE))
cov(centered.scaled.x <- scale(x)) # all 1