XML document processing in Java using XPath and XSLT

XML document processing in Java using XPath and XSLT

More like this

Discover how XPath and XSLT can significantly reduce the complexity of your Java code when handling XML documents

By  and André Tost

JavaWorld |Sep 8, 2000 1:00 AM PT

The Extensible Markup Language (XML) is certainly one of the hottest technologies at the moment. While the concept of markup languages is not new, XML seems especially attractive to Java and Internet programmers. The Java API for XML Parsing (JAXP; see Resources), having recently been defined through the Java Community Process, promises to provide a common interface for accessing XML documents. The W3C has defined the so-called Document Object Model (DOM), which provides a standard interface for working with an XML document in a tree hierarchy, whereas the Simple API for XML (SAX) lets a program parse an XML document sequentially, based on an event handling model. Both of these standards (SAX being a de facto standard) complement the JAXP. Together, these three APIs provide sufficient support for dealing with XML documents in Java, and numerous books on the market describe their use.

Featured Resource

Presented by Zero Turnaround

Coding with JRebel: Java Forever Changed

With JRebel, developers get to see their code changes immediately, fine-tune their code with

Learn More

This article introduces a way to handle XML documents that goes beyond the standard Java APIs for manipulating XML. We'll see that in many cases XPath and XSLT provide simpler, more elegant ways of solving application problems. In some simple samples, we will compare a pure Java/XML solution with one that utilizes XPath and/or XSLT.

Both XSLT and XPath are part of the Extensible Stylesheet Language (XSL) specification (see Resources). XSL consists of three parts: the XSL language specification itself, XSL Transformations (XSLT), and XML Path Language (XPath). XSL is a language for transforming XML documents; it includes a definition -- Formatting Objects -- of how XML documents can be formatted for presentation. XSLT specifies a vocabulary for transforming one XML document into another. You can consider XSLT to be XSL minus Formatting Objects. The XPath language addresses specific parts of XML documents and is intended to be used from within an XSLT stylesheet.

For the purposes of this article, it is assumed that you are familiar with the basics of XML and XSLT, as well as the DOM APIs. (For information and tutorials on these topics, see Resources.)

Note: This article's code samples were compiled and tested with the Apache Xerces XML parser and the Apache Xalan XSL processor (see Resources).

The problem

Many articles and papers that deal with XML state that it is the perfect vehicle to accomplish a good design practice in Web programming: the Model-View-Controller pattern (MVC), or, in simpler terms, the separation of application data from presentation data. If the application data is formatted in XML, it can easily be bound -- typically in a servlet or Java ServerPage -- to, say, HTML templates by using an XSL stylesheet.

But XML can do much more than merely help with model-view separation for an application's frontend. We currently observe more and more widespread use of components (for example, components developed using the EJB standard) that can be used to assemble applications, thus enhancing developer productivity. Component reusability can be improved by formatting the data that components deal with in a standard way. Indeed, we can expect to see more and more published components that use XML to describe their interfaces.

Because XML-formatted data is language-neutral, it becomes usable in cases where the client of a given application service is not known, or when it must not have any dependencies on the server. For example, in B2B environments, it may not be acceptable for two parties to have dependencies on concrete Java object interfaces for their data exchange. New technologies like the Simple Object Access Protocol (SOAP) (see Resources) address these requirements.

All of these cases have one thing in common: data is stored in XML documents and needs to be manipulated by an application. For example, an application that uses various components from different vendors will most likely have to change the structure of the (XML) data to make it fit the need of the application or adhere to a given standard.

Recent Java How-Tos

Code written using the Java APIs mentioned above would certainly do this. Moreover, there are more and more tools available with which you can turn an XML document into a JavaBean and vice versa, which makes it easier to handle the data from within a Java program. However, in many cases, the application, or at least a part of it, merely processes one or more XML documents as input and converts them into a different XML format as output. Using stylesheets in those cases is a viable alternative, as we will see later in this article.

Use XPath to locate nodes in an XML document

As stated above, the XPath language is used to locate certain parts of an XML document. As such, it's meant to be used by an XSLT stylesheet, but nothing keeps us from using it in our Java program in order to avoid lengthy iteration over a DOM element hierarchy. Indeed, we can let the XSLT/XPath processor do the work for us. Let's take a look at how this works.

Let us assume that we have an application scenario in which a source XML document is presented to the user (possibly after being processed by a stylesheet). The user makes updates to the data and, to save network bandwidth, sends only the updated records back to the application. The application looks for the XML fragment in the source document that needs to be updated and replaces it with the new data.

We will create a little sample that will help you understand the various options. For this example, we assume that the application deals with address records in an addressbook. A sample addressbook document looks like this:

<addressbook>
   <address>
      <addressee>John Smith</addressee>
      <streetaddress>250 18th Ave SE</streetaddress>
      <city>Rochester</city>
      <state>MN</state>
      <postalCode>55902</postalCode>
   </address>
   <address>
      <addressee>Bill Morris</addressee>
      <streetaddress>1234 Center Lane NW</streetaddress>
      <city>St. Paul</city>
      <state>MN</state>
      <postalCode>55123</postalCode>
</address>
</addressbook>

Popular on JavaWorld

The application (possibly, though not necessarily, a servlet) keeps an instance of the addressbook in memory as a DOM Document object. When the user changes an address, the application's frontend sends it only the updated <address> element.

The <addressee> element is used to uniquely identify an address; it serves as the primary key. This would not make a lot of sense for a real application, but we do it here to keep things simple.

We now need to write some Java code that will help us identify the <address> element in the source tree that needs to be replaced with the updated element. The findAddress() method below shows how that can be accomplished. Please note that, to keep the sample short, we've left out the appropriate error handling.

public Node findAddress(String name, Document source) {
   Element root = source.getDocumentElement();
   NodeList nl = root.getChildNodes();
   // iterate over all address nodes and find the one that has the correct addressee
   for (int i=0;i<nl.getLength(); i++) {
      Node n = nl.item(i);
      if ((n.getNodeType() == Node.ELEMENT_NODE) &&
          (((Element)n).getTagName().equals("address"))) {
         // we have an address node, now we need to find the
         // 'addressee' child
         Node addressee = ((Element)n).getElementsByTagName("addressee").item(0);
         // there is the addressee, now get the text node and compare
         Node child = addressee.getChildNodes().item(0);
         do {
            if ((child.getNodeType()==Node.TEXT_NODE) &&
                (((Text)child).getData().equals(name))) {
               return n;
            }
            child = child.getNextSibling();
                  } while (child != null);
      }
   }
   return null;
}

The code above could most likely be optimized, but it is obvious that iterating over the DOM tree can be tedious and error prone. Now let's look at how the target node can be located by using a simple XPath statement. The statement could look like this:

//address[child::addressee[text() = 'Jim Smith']]

We can now rewrite our previous method. This time, we use the XPath statement to find the desired node:

public Node findAddress(String name, Document source) throws Exception {
   // need to recreate a few helper objects
   XMLParserLiaison xpathSupport = new XMLParserLiaisonDefault();
   XPathProcessor xpathParser = new XPathProcessorImpl(xpathSupport);
   PrefixResolver prefixResolver = new PrefixResolverDefault(source.getDocumentElement());
   // create the XPath and initialize it
   XPath xp = new XPath();
   String xpString = "//address[child::addressee[text() = '"+name+"']]";
   xpathParser.initXPath(xp, xpString, prefixResolver);
   // now execute the XPath select statement
   XObject list = xp.execute(xpathSupport, source.getDocumentElement(), prefixResolver);
   // return the resulting node
   return list.nodeset().item(0);
}

The above code may not look a lot better than the previous try, but most of this method's contents could be encapsulated in a helper class. The only part that changes over and over is the actual XPath expression and the target node.

This lets us create an XPathHelper class, which looks like this:

import org.w3c.dom.*;
import org.xml.sax.*;
import org.apache.xalan.xpath.*;
import org.apache.xalan.xpath.xml.*;
public class XPathHelper {
   XMLParserLiaison xpathSupport = null;
   XPathProcessor xpathParser = null;
   PrefixResolver prefixResolver = null;
   XPathHelper() {
      xpathSupport = new XMLParserLiaisonDefault();
      xpathParser = new XPathProcessorImpl(xpathSupport);
   }
   public NodeList processXPath(String xpath, Node target) thrws SAXException {
      prefixResolver = new PrefixResolverDefault(target);
      // create the XPath and initialize it
      XPath xp = new XPath();
      xpathParser.initXPath(xp, xpath, prefixResolver);
      // now execute the XPath select statement
      XObject list = xp.execute(xpathSupport, target, prefixResolver);
      // return the resulting node
      return list.nodeset();
   }
}

After creating the helper class, we can rewrite our finder method again, which is now very short:

public Node findAddress(String name, Document source) throws Exception {
   XPathHelper xpathHelper = new XPathHelper();
   NodeList nl = xpathHelper.processXPath(
        "//address[child::addressee[text() = '"+name+"']]",
        source.getDocumentElement());
   return nl.item(0);
}

The helper class can now be used whenever a node or a set of nodes needs to be located in a given XML document. The actual XPath statement could even be loaded from an external source, so that changes could be made on the fly if the source document structure changes. In this case, no recompile is necessary.

Process XML documents with XSL stylesheets

In some cases, it makes sense to outsource the entire handling of an XML document to an external XSL stylesheet, a process in some respects similar to the use of XPath as described in the previous section. With XSL stylesheets, you can create an output document by selecting nodes from the input document and merging their content with stylesheet content, based on pattern rules.

If an application changes the structure and content of an XML document and producing a new document, it may be better and easier to use a stylesheet to handle the work rather than writing a Java program that does the same job. The stylesheet is most likely stored in an external file, allowing you to change it on the fly, without the need to recompile.

For example, we could accomplish the processing for the addressbook sample by creating a stylesheet that merges the cached version of the addressbook with the updated one, thus creating a new document with the updates in it.

Here is a sample of such a stylesheet:

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
   <xsl:output method="xml"/>
<xsl:variable name="doc-file">http://mymachine.com/changed.xml</xsl:variable>
<!-- copy everything that has no other pattern defined -->
<xsl:template match="* | @*">
   <xsl:copy><xsl:copy-of select="@*"/><xsl:apply-templates/></xsl:copy>
</xsl:template>
<!-- check for every <address> element if an updated one exists -->
<xsl:template match="//address">
   <xsl:param name="addresseeName">
      <xsl:value-of select="addressee"/>
   </xsl:param>
   <xsl:choose>
      <xsl:when test="document($doc-file)//addressee[text()=$addresseeName]">
         <xsl:copy-of select="document($doc-file)//address[child::addressee[text()=$addresseeName]]"/>
      </xsl:when>
      <xsl:otherwise>
         <xsl:apply-templates/>
      </xsl:otherwise>
   </xsl:choose>
</xsl:template>
</xsl:stylesheet>

Note that the above stylesheet takes the updated data out of a file called changed.xml. A real application would obviously not want to store the changed data in a file before processing it. One solution is to add a special attribute to the <address> element, indicating whether or not it has been updated. Then the application could simply append the updated data to the source document and define a different stylesheet that detects updated records and replaces the outdated ones.

All the application has to do now is create an XSLTProcessor object and let it do the work:

import org.apache.xalan.xslt.*;
   ...
   XSLTProcessor processor = XSLTProcessorFactory.getProcessor();
   processor.process(new XSLTInputSource(sourceDoc.getDocumentElement(),
                     new XSLTInputsource("http://mymachine.com/updateAddress.xsl"),
                     new XSLTResultTarget(newDoc.getDocumentElement());
   sourceDoc = newDoc;
   ...

Conclusion

To many of us Java programmers, XML is a relatively new technology that we need to master. This article shows that the manual parsing and processing of an XML document is only one option, and that we may be able to use of XPath expressions and XSL stylesheets to avoid a lot of parsing and iterating, thus reducing the amount of code that we need to write. Moreover, under this system the information about how the data is processed is stored externally and can be changed without recompiling the application. The mechanisms described here can be used for the creation of presentation data for a Web application, but can also be applied in all cases in which XML data needs to be processed.

Popular Resources

See All

Learn more about this topic

  • Recent XML articles in JavaWorld

"Mapping XML to Java, Part 1," Robert Hustead (JavaWorld, August 4, 2000) explains how to use SAX to map XML documents to Java objectshttp://www.javaworld.com/javaworld/jw-08-2000/jw-0804-sax.html

"XSL Gives Your XML Some Style," Michael Ball (JavaWorld, June 30, 2000) explains how XSL stylesheets can help with you programming
http://www.javaworld.com/javaworld/jw-06-2000/jw-0630-xsl.html

"Easy Java/XML Integration, Part 1," Jason Hunter and Brett McLaughlin (JavaWorld, May 18, 2000) introduces the new open source JDOM API for manipulating XML from within Java
http://www.javaworld.com/javaworld/jw-05-2000/jw-0518-jdom.html

"Programming XML in Java, Part 1," Mark Johnson (JavaWorld, March 2000) looks at how to use SAX to process XML documents in Java
http://www.javaworld.com/javaworld/jw-03-2000/jw-03-xmlsax.html

  • XML help
  • Other valuable XML-related resources
时间: 2024-09-16 02:24:57

XML document processing in Java using XPath and XSLT的相关文章

java使用xpath和dom4j解析xml_java

1 XML文件解析的4种方法 通常解析XML文件有四种经典的方法.基本的解析方式有两种,一种叫SAX,另一种叫DOM.SAX是基于事件流的解析,DOM是基于XML文档树结构的解析.在此基础上,为了减少DOM.SAX的编码量,出现了JDOM,其优点是,20-80原则(帕累托法则),极大减少了代码量.通常情况下JDOM使用时满足要实现的功能简单,如解析.创建等要求.但在底层,JDOM还是使用SAX(最常用).DOM.Xanan文档.另外一种是DOM4J,是一个非常非常优秀的Java XML API,

java selenium XPath 定位实现方法_java

xpath 的定位方法, 非常强大.  使用这种方法几乎可以定位到页面上的任意元素. 阅读目录 什么是xpath xpath定位的缺点 testXpath.html 代码如下 绝对路径定位方式 使用浏览器调试工具,可以直接获取xpath语句 绝对路径的缺点 绝对路径和相对路径的区别 相对路径定位方式 使用索引号定位 使用页面属性定位 模糊定位starts-with关键字 模糊定位contains关键字 text() 函数 文本定位 什么是xpath xpath 是XML Path的简称, 由于H

Unexpected exception parsing XML document from ServletContext resource [/WEB-INF

问题描述 我的struts2和spring整合,但整合后出这问题 找不到怎么回事高手帮我看看好不?我的配置没什么问题,不知道是不是那个jar包有问题这是在web.xml中的配置 <context-param><param-name>contextConfigLocation</param-name><param-value>/WEB-INF/classes/applicationContext.xml</param-value></cont

XML字符串和XML DOCUMENT的相互转换

xml|转换|字符串     在做一般的XML数据交换过程中,我更乐意传递XML字符串,而不是格式化的XML Document.这就涉及到XML字符串和Xml Document的转换问题,说白了这是个很简单的问题,本文就各种XML解析器分别列举如下,以方便自己今后查阅. 一.使用最原始的javax.xml.parsers,标准的jdk api // 字符串转XMLString xmlStr = \"......\";StringReader sr = new StringReader(

xml文件操作的java程序(续)

xml|程序 /**     * helper方法,查找一个指定的元素     *     * @param name 元素名称,格式为 X.Y.Z     * @return Element 如果找到就返回这个元素,否则返回null     */    public Element findOnly(String name)    {        //分解元素的名称        String[] propName = parsePropertyName(name);        Elem

配置pom.xml用maven打包java工程的方法(推荐)_java

最近由于项目需要,研究了一下maven的打包,项目要做到 1,生成3个目录/lib,/conf,/bin目录 2,把所有的jar目录编译.拷贝到/lib目录(包括maven的jar包和lib目录下的jar,以及编译的jar包) 3,把所有的启动脚本从工程根目录拷贝到/bin目录 4,把所有的配置文件从src/main/resources拷贝到/conf  下面是配置的pom.xml,我把相关的配置都加了注释,一看就能明白,把build节点拷贝到你们的项目中,就基本可以用了:) <project

java使用xpath解析xml示例分享_java

XPath即为XML路径语言(XML Path Language),它是一种用来确定XML文档中某部分位置的语言.XPath基于XML的树状结构,提供在数据结构树中找寻节点的能力.起初 XPath 的提出的初衷是将其作为一个通用的.介于XPointer与XSL间的语法模型.但是 XPath 很快的被开发者采用来当作小型查询语言. XPathTest.java 复制代码 代码如下: package com.hongyuan.test; import java.io.File;import java

java:利用xpath删除xml中的空节点

原始xml内容: 1 <data> 2 <a> </a> 3 <b>b1</b> 4 <awb> 5 <awbpre>123</awbpre> 6 <awbno></awbno> 7 </awb> 8 <spls> 9 <spl /> 10 </spls> 11 </data> 可用下面的代码去掉 <awbno>&l

搜集整理的对xml文件操作的java程序,基本可以满足基本的操作。

xml|程序 包括生成xml文件,增加删除修改节点,增加修改属性等等.现把我的程序共享,还有我的测试例子,看看大家有什么更好的建议.其中用到了jdom.jar 和xerces-1.2.2.jar ,都是标准的xml解析包.可以从网上下载. /** * $RCSfile: XMLProperty.java,v $ * $Revision: 1.3 $ * $Date: 2002/04/02 21:17:09 $ * * Copyright (C) 1999-2001 CoolServlets, I