问题描述
XML文件大概如下,通过DOM去解析,但是如果元素中含有中文,则无法整体返回字符串,只能返回一个第一个中文,如“蒙”:XML文件一:<posts type="array"><post><content>Source Milk Title</content><created-at type="datetime">2011-05-30T12:47:58Z</created-at><id type="integer">1</id><name>Milk</name><title>Milk Title</title><updated-at type="datetime">2011-08-14T12:23:16Z</updated-at></post><post><content>蒙牛的好喝酸奶 </content><created-at type="datetime">2011-06-06T12:52:21Z</created-at><id type="integer">2</id><name>蒙牛酸奶</name><title>蒙牛的好喝酸奶 </title><updated-at type="datetime">2011-06-06T12:52:21Z</updated-at></post></posts>具体相关代码如下:代码一: DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); InputSource is = new InputSource(); is.setCharacterStream(new StringReader(xmlString)); Document doc = db.parse(is); NodeList nodes = doc.getElementsByTagName("post"); eventsArrayList = new ArrayList<myEvents>(); //Gertig //Iterate the events for (int i = 0; i < nodes.getLength(); i++) { Element element = (Element) nodes.item(i); eventsArrayList.add(new myEvents()); NodeList eventIDNum = element.getElementsByTagName("id"); Element line = (Element) eventIDNum.item(0); eventsArrayList.get(i).eventID = Integer.parseInt(getCharacterDataFromElement(line)); NodeList eventName = element.getElementsByTagName("name"); line = (Element) eventName.item(0); eventsArrayList.get(i).name = getCharacterDataFromElement(line).trim();// String reName = getCharacterDataFromElement(line);// String reTrimName = getCharacterDataFromElement(line).trim(); // NodeList eventBudget = element.getElementsByTagName("content");// line = (Element) eventBudget.item(0);// eventsArrayList.get(i).budget = Double.parseDouble(getCharacterDataFromElement(line)); NodeList eventContent = element.getElementsByTagName("content"); line = (Element) eventContent.item(0); eventsArrayList.get(i).content = getCharacterDataFromElement(line).trim(); }代码二:public static String getCharacterDataFromElement(Element e) { Node child = e.getFirstChild(); Node lchild = e.getLastChild(); if (child instanceof CharacterData) { CharacterData cd = (CharacterData) child; CharacterData lcd = (CharacterData) lchild; String cdStr = cd.getNodeValue(); String lcdStr = lcd.getNodeValue(); return cd.getData(); } return "?"; //ListActivity will display a ? if a null value is passed to the Rails server }通过代码二分析,发现中文字符串在此处并没有被看成一个完整的节点,而是多个node,比如对:<content>蒙牛的好喝酸奶 </content>解析, getFirstChild()返回的是第一个字符“蒙”, etLastChild()返回的是最后一个字符“奶”。问题出现在什么地方呢? 求解答?另外,通过debug,发现传进去的不是初始的XML文件一,而是类似如下含有对应中文编码字符。或许与此有关,但是不知其然? XML文件二:<?xml version="1.0" encoding="UTF-8"?><posts type="array"> <post> <content>Source Milk Title</content> <created-at type="datetime">2011-05-30T12:47:58Z</created-at> <id type="integer">1</id> <name>Milk</name> <title>Milk Title</title> <updated-at type="datetime">2011-08-14T12:23:16Z</updated-at> </post> <post> <content>蒙牛的好喝酸奶 </content> <created-at type="datetime">2011-06-06T12:52:21Z</created-at> <id type="integer">2</id> <name>蒙牛酸奶</name> <title>蒙牛的好喝酸奶 </title> <updated-at type="datetime">2011-06-06T12:52:21Z</updated-at> </post></posts> 问题补充:myali88 写道
解决方案
引用InputSource is = new InputSource(); is.setCharacterStream(new StringReader(xmlString)); Document doc = db.parse(is); 不知道你这里的“xmlString”是怎么样获取的,我以直接在声明变量的方式读写的话,没有出现你说的问题。另外我也采用了另外两种方式:is.setByteStream(Dom4ChineseParser.class.getResourceAsStream("test.xml"));和is.setCharacterStream(new InputStreamReader(Dom4ChineseParser.class.getResourceAsStream("test.xml") , "UTF-8"));都没有发现你说的问题。引用另外,通过debug,发现传进去的不是初始的XML文件一,而是类似如下含有对应中文编码字符。或许与此有关,但是不知其然? 这里看到的xml应该是按<?xml version="1.0" encoding="UTF-8"?> 编码后的结果。