So this is pretty crazy. I'm messing around with ElementTree
(which has been nothing less than perfect) and trying to get it to act
like a xml.dom.pulldom/XmlTextReader style pull-parser. But I'd like
to be able to assemble a chain of generator producing/consuming
functions (or other callable) so that the file can be read, parsed,
filtered/mutated, encoded, and written all incrementally.
Check it out:
import sys
import pulltree # that's what I'm working on :)
def upper_filter(source):
for (ev, item) in source:
if ev == pulltree.CHARACTERS:
item = item.upper()
yield (ev, item)
reader = pulltree.reader(sys.stdin)
filter = upper_filter(reader)
writer = pulltree.writer(filter, sys.stdout)
for (ev, item) in writer:
pass
C-z
$ echo "<hello>world</hello>" | python test_filter.py
<hello>WORLD</hello>
That felt good. More functional than a chain of SAX XMLFilters,
almost as efficient, and muuuuch perdier.
Something like this might work someday soon:
import urllib2
from pulltree
XINCLUDE = '{http://www.w3.org/2001/XInclude}include'
def xinclude_filter(source):
events = iter(source)
for (event, item) in events:
if event == pulltree.START_ELEMENT and elm.tag == XINCLUDE:
href = item.attrib['href']
for woot in pulltree.reader(urllib2.urlopen(href))
yield woot
pulltree.eat(elm, events) # eat events to the end of the element
yield (ev, elm)
Granted, that's as basic an XInclude processor could be and still be useful but you get the point.
To
...
on Sun 12/05/04
at 12:08 PM