今天来看下,HotSpotVM里面字符串实现相关的一些东东。来看这样一个问题,下面的栗子中,在HotSpotVM里面会保存几份Hello,World.
这个字符串?
public static void main(String[] args) throws Throwable {
String s1 = "Hello,World.";
String s2 = "Hello,"+"World.";
StringBuilder sb = new StringBuilder();
sb.append("Hello,").append("World.");
String s3 = sb.toString();
String s4 = sb.toString().intern();
System.out.println("s1 == s2 #" + (s1==s2));
System.out.println("s1 == s3 #" + (s1==s3));
System.out.println("s1 == s4 #" + (s1==s4));
System.in.read();
}
先给出结论,4份(似乎有点浪费:),至于这4份都在什么地方,下面就来看看。在那之前先来看看示例的输出,
s1 == s2 #true
s1 == s3 #false
s1 == s4 #true
==
实际上对比的是oop所指向的地址,s1
,s2
,s4
指向的都是同一个地址。s2
已经被编译器优化了,来看下编译后的字节码,
Code:
stack=4, locals=6, args_size=1
0: ldc #2 // String Hello,World.
2: astore_1
3: ldc #2 // String Hello,World.
5: astore_2
...
Constant pool:
#1 = Methodref #19.#47 // java/lang/Object."<init>":()V
#2 = String #48 // Hello,World.
#3 = Class #49 // java/lang/StringBuilder
#4 = Methodref #3.#47 // java/lang/StringBuilder."<init>":()V
#5 = String #50 // Hello,
#6 = Methodref #3.#51 // java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
#7 = String #52 // World.
#8 = Methodref #3.#53 // java/lang/StringBuilder.toString:()Ljava/lang/String;
#9 = Methodref #54.#55 // java/lang/String.intern:()Ljava/lang/String;
...
编译器已经帮我们执行了+
运算。
Stack Memory
我们可以通过HSDB来看看几个变量所指向的地址,
从0x0000000002aced58
到0x0000000002aced78
(按字节编址)存放的分别是s4
,s3
,sb
,s2
和s1
。可以看到s1
,s2
,s4
指向的都是0x00000007d619bc90
这个地址,所以用==
来比较它们返回的都是true
。我们来看看这个地址上存放的是啥,
hsdb> inspect 0x00000007d619bc90
instance of "Hello,World." @ 0x00000007d619bc90 @ 0x00000007d619bc90 (size = 24)
_mark: 1
value: [C @ 0x00000007d619bca8 Oop for [C @0x00000007d619bca8
hash: 0
hash32: 0
看不懂这个输出的可以先看看之前这篇分析HotSpot对象机制的文章。这个版本的HSDB貌似有bug,没有输出_metadata
这段,忽略先。
该地址上是个String
实例,value
,hash
,hash32
是String
对象的实例字段,上面输出的自然就是该实例的数据,来看下value
字段,
hsdb> inspect 0x00000007d619bca8
instance of [C @ 0x00000007d619bca8 @ 0x00000007d619bca8 (size = 40)
_mark: 1
0: 'H'
1: 'e'
2: 'l'
3: 'l'
4: 'o'
5: ','
6: 'W'
7: 'o'
8: 'r'
9: 'l'
10: 'd'
11: '.'
OK,第1份Hello,World.
出现了。这份数据所在的地址0x00000007d619bca8
是在eden
区,
hsdb> universe
Heap Parameters:
ParallelScavengeHeap [
PSYoungGen [
eden = [0x00000007d6000000,0x00000007d628f8f8,0x00000007d8000000] ,
from = [0x00000007d8500000,0x00000007d8500000,0x00000007d8a00000] ,
to = [0x00000007d8000000,0x00000007d8000000,0x00000007d8500000]
]
PSOldGen [ [0x0000000782000000,0x0000000782000000,0x0000000787400000] ]
PSPermGen [ [0x000000077ce00000,0x000000077d103078,0x000000077e300000] ]
]
接下来分别inspect下其他两个变量,也就是sb
跟s3
所指向的地址,sb
所指向的0x00000007d619bf60
,
s3
所指向的0x00000007d619c018
,
妥妥的我们在上面又看到了两份Hello,World.
数据了。在找到第4份数据之前,我们先来看看s2
,s3
和s4
的区别。s2
前面已经说过了,编译器做了优化,至于运行时HotSpotVM是如何赋予同一个oop的,暂时先不管以后再研究。s3
呢,看下StringBuilder#toString
方法,
public String toString() {
// Create a copy, don't share the array
return new String(value, 0, count);
}
- 1
- 2
- 3
- 4
直接new
了一个String
出来,所以不会是同一个oop,因此也是要复制一份Hello,World.
了。而使用String#intern
方法得到的s4
跟s1
又是同一个oop,那接下来就来看看这个方法的实现。
String Table
String#intern
是个本地方法,
/**
* Returns a canonical representation for the string object.
* <p>
* A pool of strings, initially empty, is maintained privately by the
* class <code>String</code>.
* <p>
* When the intern method is invoked, if the pool already contains a
* string equal to this <code>String</code> object as determined by
* the {@link #equals(Object)} method, then the string from the pool is
* returned. Otherwise, this <code>String</code> object is added to the
* pool and a reference to this <code>String</code> object is returned.
* <p>
* It follows that for any two strings <code>s</code> and <code>t</code>,
* <code>s.intern() == t.intern()</code> is <code>true</code>
* if and only if <code>s.equals(t)</code> is <code>true</code>.
* <p>
* All literal strings and string-valued constant expressions are
* interned. String literals are defined in section 3.10.5 of the
* <cite>The Java Language Specification</cite>.
*
* @return a string that has the same contents as this string, but is
* guaranteed to be from a pool of unique strings.
*/
public native String intern();
它的实现在String.c,
JNIEXPORT jobject JNICALL
Java_java_lang_String_intern(JNIEnv *env, jobject this)
{
return JVM_InternString(env, this);
}
JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))
JVMWrapper("JVM_InternString");
JvmtiVMObjectAllocEventCollector oam;
if (str == NULL) return NULL;
oop string = JNIHandles::resolve_non_null(str);
oop result = StringTable::intern(string, CHECK_NULL);
return (jstring) JNIHandles::make_local(env, result);
JVM_END
使用了一个StringTable
,StringTable
的代码是从SymbolTable
里面拆出来的,看下symbolTable.hpp
的说明,
// The symbol table holds all Symbol*s and corresponding interned strings.
// Symbol*s and literal strings should be canonicalized.
//
// The interned strings are created lazily.
//
// It is implemented as an open hash table with a fixed number of buckets.
//
// %note:
// - symbolTableEntrys are allocated in blocks to reduce the space overhead.
StringTable
是个Hashtable
,
class StringTable : public Hashtable<oop, mtSymbol>
oop StringTable::intern(oop string, TRAPS)
{
if (string == NULL) return NULL;
ResourceMark rm(THREAD);
int length;
Handle h_string (THREAD, string);
jchar* chars = java_lang_String::as_unicode_string(string, length);
oop result = intern(h_string, chars, length, CHECK_NULL);
return result;
}
oop StringTable::intern(Handle string_or_null, jchar* name,
int len, TRAPS) {
unsigned int hashValue = hash_string(name, len);
int index = the_table()->hash_to_index(hashValue);
oop found_string = the_table()->lookup(index, name, len, hashValue);
// Found
if (found_string != NULL) return found_string;
debug_only(StableMemoryChecker smc(name, len * sizeof(name[0])));
assert(!Universe::heap()->is_in_reserved(name) || GC_locker::is_active(),
"proposed name of symbol must be stable");
Handle string;
// try to reuse the string if possible
if (!string_or_null.is_null() && (!JavaObjectsInPerm || string_or_null()->is_perm())) {
string = string_or_null;
} else {
string = java_lang_String::create_tenured_from_unicode(name, len, CHECK_NULL);
}
// Grab the StringTable_lock before getting the_table() because it could
// change at safepoint.
MutexLocker ml(StringTable_lock, THREAD);
// Otherwise, add to symbol to table
return the_table()->basic_add(index, string, name, len,
hashValue, CHECK_NULL);
}
所以当我们调用String#intern
方法时,就会先来查找StringTable
。从示例中我们可以猜想,当ldc
一个字符串常量的时候,也就是在给s1
赋值的时候,HotSpot会自动帮我们调用intern
方法,所以在给s4
赋值,查找StringTable
时,发现已经有该字符串的oop了,于是就直接返回,赋值给了s4
,因此s4
与s1
便是同一个oop。
我们可以借助SA写一个小工具来dump下StringTable
中所有的oop(SA真是个好东西哇:),
import sun.jvm.hotspot.memory.StringTable;
import sun.jvm.hotspot.oops.Instance;
import sun.jvm.hotspot.tools.Tool;
public class StringTableDumper extends Tool {
public static void main(String[] args) {
StringTableDumper printer = new StringTableDumper();
printer.start(args);
printer.stop();
}
@Override
public void run() {
StringTable stringTable = StringTable.getTheTable();
stringTable.stringsDo(new StringTable.StringVisitor() {
@Override
public void visit(Instance instance) {
instance.print();
}
});
}
}
执行一下,
> java me.kisimple.just4fun.StringTableDumper 6092 > stringTable.txt
Attaching to process ID 6092, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.51-b03
输出的数据有点多,妥妥的我们可以找到s1
,s2
,s4
共同指向的地址,0x00000007d619bc90
"Hello,World." @ 0x00000007d619bc90 (object size = 24)
- _mark: {0} :1
- _metadata._compressed_klass: {8} :InstanceKlass for java/lang/String @ 0x000000077ce0afe8
- value: {12} :[C @ 0x00000007d619bca8
- hash: {16} :0
- hash32: {20} :0
_metadata
也终于是打印出来了:)
Symbol Table
最后我们就不卖关子了,第4份Hello,World.
是在上面提到了的SymbolTable
中,同样的我们使用SA写个小工具来打印SymbolTable
中的数据,
import sun.jvm.hotspot.memory.SymbolTable;
import sun.jvm.hotspot.oops.Symbol;
import sun.jvm.hotspot.tools.Tool;
public class SymbolTableDumper extends Tool {
public static void main(String[] args) {
SymbolTableDumper printer = new SymbolTableDumper();
printer.start(args);
printer.stop();
}
@Override
public void run() {
SymbolTable symbolTable = SymbolTable.getTheTable();
symbolTable.symbolsDo(new SymbolTable.SymbolVisitor() {
@Override
public void visit(Symbol symbol) {
System.out.println(symbol.asString() + "@" + symbol.getAddress());
}
});
}
}
在打印的结果中我们可以看到这么一行,
Hello,World.@0x000000000ca89740
这就是第4份数据了,而这份数据所在的地址0x000000000ca89740
,通过对比universe
输出的结果,是不在GC堆上面的,而上面的3份Hello,World.
则全都是在GC堆的YoungGen,因此都受GC管理,第4份则是使用引用计数来管理,具体可以看下源码。
那么这一份数据又是做什么用的?事实上这份数据对应的是class文件中,Constant pool
中的这一行,
#2 = String #48 // Hello,World.
当执行ldc #2
时就需要用到SymbolTable
中的这个符号。