rank(x, na.last = TRUE,
ties.method = c("average", "first", "random", "max", "min"))
测试 :
> x=array(rpois(35,lambda=10), dim=c(5,7))
> x
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 14 9 10 16 10 9 8
[2,] 3 10 12 7 12 14 15
[3,] 10 8 8 13 11 5 16
[4,] 8 9 17 14 7 4 9
[5,] 7 7 10 13 9 6 8
排好顺序更容易看出rank是如何计算的
> sort(x)
[1] 3 4 5 6 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 10 10 10 10 11 12
[26] 12 13 13 14 14 14 15 16 16 17
> rank(sort(x))
[1] 1.0 2.0 3.0 4.0 6.5 6.5 6.5 6.5 11.0 11.0 11.0 11.0 11.0 16.0 16.0
[16] 16.0 16.0 16.0 21.0 21.0 21.0 21.0 21.0 24.0 25.5 25.5 27.5 27.5 30.0 30.0
[31] 30.0 32.0 33.5 33.5 35.0
默认是平均值方法.
> rank(sort(x), ties.method = "average")
[1] 1.0 2.0 3.0 4.0 6.5 6.5 6.5 6.5 11.0 11.0 11.0 11.0 11.0 16.0 16.0
[16] 16.0 16.0 16.0 21.0 21.0 21.0 21.0 21.0 24.0 25.5 25.5 27.5 27.5 30.0 30.0
[31] 30.0 32.0 33.5 33.5 35.0
总共有35个值, 所以rank输出1到35的值, 但是为什么 7 7 7 7对应的是6.5呢?
我们看4个7实际上位置对应的是5,6,7,8 , 取平均值就是6.5.
那么接下来的5个8为什么得到的rank是11呢, 因为5个8的位置是9,10,11,12,13, 平均值是11.
这就是ties.method=average的算法.
接下来我们看看其他的.
min对应的是位置的最小值, 例如7777的位置是5,6,7,8, 取最小值5
88888的位置是9,10,11,12,13, 取最小值9
> rank(sort(x), ties.method = "min")
[1] 1 2 3 4 5 5 5 5 9 9 9 9 9 14 14 14 14 14 19 19 19 19 19 24 25
[26] 25 27 27 29 29 29 32 33 33 35
max对应的是位置的最大值, 例如7777的位置是5,6,7,8, 取最大值8
88888的位置是9,10,11,12,13, 取最大值13
> rank(sort(x), ties.method = "max")
[1] 1 2 3 4 8 8 8 8 13 13 13 13 13 18 18 18 18 18 23 23 23 23 23 24 26
[26] 26 28 28 31 31 31 32 34 34 35
first取的是就位置值, 所以输出的其实是连续的值.
> rank(sort(x), ties.method = "first")
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35
random取的是组内的随机值. 例如 7 8 5 6, 6 7 8 5
> rank(sort(x), ties.method = "random")
[1] 1 2 3 4 7 8 5 6 12 11 9 10 13 18 16 15 14 17 22 23 21 19 20 24 25
[26] 26 28 27 29 31 30 32 34 33 35
> rank(sort(x), ties.method = "random")
[1] 1 2 3 4 6 7 8 5 12 10 13 9 11 14 15 17 16 18 21 19 23 22 20 24 25
[26] 26 27 28 31 29 30 32 34 33 35
[参考]
1.
help(rank)
rank package:base R Documentation
Sample Ranks
Description:
Returns the sample ranks of the values in a vector. Ties (i.e.,
equal values) and missing values can be handled in several ways.
Usage:
rank(x, na.last = TRUE,
ties.method = c("average", "first", "random", "max", "min"))
Arguments:
x: a numeric, complex, character or logical vector.
na.last: for controlling the treatment of ‘NA’s. If ‘TRUE’, missing
values in the data are put last; if ‘FALSE’, they are put
first; if ‘NA’, they are removed; if ‘"keep"’ they are kept
with rank ‘NA’.
ties.method: a character string specifying how ties are treated, see
‘Details’; can be abbreviated.
Details:
If all components are different (and no ‘NA’s), the ranks are well
defined, with values in ‘seq_len(x)’. With some values equal
(called ‘ties’), the argument ‘ties.method’ determines the result
at the corresponding indices. The ‘"first"’ method results in a
permutation with increasing values at each index set of ties. The
‘"random"’ method puts these in random order whereas the default,
‘"average"’, replaces them by their mean, and ‘"max"’ and ‘"min"’
replaces them by their maximum and minimum respectively, the
latter being the typical sports ranking.
‘NA’ values are never considered to be equal: for ‘na.last = TRUE’
and ‘na.last = FALSE’ they are given distinct ranks in the order
in which they occur in ‘x’.
*NB*: ‘rank’ is not itself generic but ‘xtfrm’ is, and
‘rank(xtfrm(x), ....)’ will have the desired result if there is a
‘xtfrm’ method. Otherwise, ‘rank’ will make use of ‘==’, ‘>’,
‘is.na’ and extraction methods for classed objects, possibly
rather slowly.
Value:
A numeric vector of the same length as ‘x’ with names copied from
‘x’ (unless ‘na.last = NA’, when missing values are removed). The
vector is of integer type unless ‘x’ is a long vector or
‘ties.method = "average"’ when it is of double type (whether or
not there are any ties).
References:
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
Language_. Wadsworth & Brooks/Cole.
See Also:
‘order’ and ‘sort’.
Examples:
(r1 <- rank(x1 <- c(3, 1, 4, 15, 92)))
x2 <- c(3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5)
names(x2) <- letters[1:11]
(r2 <- rank(x2)) # ties are averaged
## rank() is "idempotent": rank(rank(x)) == rank(x) :
stopifnot(rank(r1) == r1, rank(r2) == r2)
## ranks without averaging
rank(x2, ties.method= "first") # first occurrence wins
rank(x2, ties.method= "random") # ties broken at random
rank(x2, ties.method= "random") # and again
## keep ties ties, no average
(rma <- rank(x2, ties.method= "max")) # as used classically
(rmi <- rank(x2, ties.method= "min")) # as in Sports
stopifnot(rma + rmi == round(r2 + r2))
时间: 2024-09-30 04:23:40