Wednesday, July 29, 2009

XML Document processing comparison between java and python

I had a project that requires xml document processing. At first, I was thinking to do it using Java, but since the project was not so big, I decided to use Python. I really have to say that it was a good decision. In my opinion, xml doc processing feels more natural in Python compare to Java.

Let's go straight to the code comparison.
There are many steps involved in java before actually getting the xml doc elements that match with xpath expression defined.

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("books.xml");

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//book[author='Neal Stephenson']/title/text()");

Object result = expr.evaluate(doc, XPathConstants.NODESET);

NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}


I took the example from IBM website, here. You should check out the link if you are interested using xpath in java.

Now, in Python, using lxml library.

doc = etree.parse("books.xml")
nodes = doc.xpath("//book[author='Neal Stephenson']/title/text()")
print(nodes)


I mostly use java for my projects, pretty much happy with it, even though sometimes I wish that java has some syntactic sugars like in C#.
So, if you have the option to choose which programming language for your xml projects, give Python a try. I am sure you will like it.

No comments:

Post a Comment