XML Porting under Python

Comments

Wow. There’s been some buzz about XML in the Python community lately. I don’t have enough time to read python-dev, so I just look at the pretty blogs, but there’s been impressive improvements and flame-fests both. I even got quoted in one of them.

First off, I got some email about how my benchmarks were performed. One person actually called my set up unrealistic. Um. My setup was essentially real. I parsed an actual XML file my application has to parse and I factored in the load time of Python and the import time of the module. My application is a CGI (for the moment), so both of these are important issues.

IMHO, more people should benchmark under their intended use, rather than solely go by labratory condition tests. (Though the latter are, of course, very important).

Secondly, I ported my application to Ft.Xml. Having read another Uche article, I saw that I could receive a performance boost using FourThought’s latest DOM-like toolkit. Adapting my previous benchmarks produced a 5x increase in speed over 4DOM.

It’s not 90x, but it’s going in the right direction.

I sat down to adapt the application, and ran immediately into two snags.

Because the parser is terribly name-space aware, you cannot use getAttribute. All occurrances must be replaced with getAttributeNS. This was a bit of a search-and-replace job, but annoying.
The parser doesn’t support getElementsByName, you must use an XPath expression, or build your own DOM-traversal code. This is a little annoying and largely over-kill. What makes it bullet-worthy is that Uche has a page outlining not one but several alternatives. Why not just implement one in his code? I used the wasteful XPath version.

Now, the usual caveats apply- I didn’t pay for the development of the code, so I have only limited right to bitch and moan. It is an improvement, and it did allow me to replace my existing code in a couple of hours. A five-fold increase is impressive.

However, it is neither Pythonic like Amara or cElementTree, and also not even close to being a DOM. In Java, I can rip out a DOM and replace it with another one, just by toggling my classpath and re-running my unit-tests.

It’s both good and bad that Python doesn’t have an XML parser standard. Hooray for being individual and creative. Boo and sucks for replacing a library with an equivalent meaning an application overhaul.