Thursday, June 11, 2009

Processing xml document in python using libxml

Processing xml document in python using libxml

I had to spent sometimes to gather information about processing xml document in python. There are several different libraries to use, such as minidom, libxml, etree, etc. During my work, I had been working with xpath, so I need xpath support in the built-in library that I work with, so I chose libxml. It is difficult for me to gather information on how to use that API. There are minimum documentations, guides, examples, and tutorials out there.
After spending sometimes, I became familiar on using it. I am writing this, so other people could simply learn on how to use libxml in more realistic situation. Many examples I found in the internet is way too basic, and does not suit my need.


example 1:
--------------------------------------

import sys
import libxml2

def main():
doc = libxml2.parseFile("data.xml")
ctxt = doc.xpathNewContext()
# get student elements in xml doc
res = ctxt.xpathEval("//student")
# output all properties for "student" element using xpath
sys.stdout.write(res[0].get_properties().content + "\n")

for chld in res[0].children:
if chld.type == "element":
# output all information within the child element
sys.stdout.write(chld.content + "\n")

doc.freeDoc()
ctxt.xpathFreeContext()

if __name__ == '__main__':
main()


--------------------------------------


example 2:
--------------------------------------

import sys
import libxml2

def main():
doc = libxml2.parseFile("layout.xml")
ctxt = doc.xpathNewContext()
# get all "property" element that has attribute "Location",
# which parent is "object" element that has attribute "pickList*"
res = ctxt.xpathEval("//Object[starts-with(@name,'pickList')]/Property[@name='Location']")
for r in res:
sys.stdout.write(r.content + "\n")

sys.stdout.write(str(len(res)))

doc.freeDoc()
ctxt.xpathFreeContext()

if __name__ == '__main__':
main()


--------------------------------------


It is also useful to see the source code of libxml.py, just to see what's going on under the hood.

Not long after writing this post, I found even better xml API, which is lxml. I will make separate post about this wonderful xml API for python.

No comments:

Post a Comment