Read XML with a listener Model (SAX)

edit

Read XML with a listener Model (SAX)

While XMLParse creates an object tree (DOM), the SAX event-driven model is more memory-efficient for large XML documents - it processes events as they occur without loading the entire document.

Example XML document:

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
    <cd>
        <title>Empire Burlesque</title>
        <artist>Bob Dylan</artist>
        <country>USA</country>
        <company>Columbia</company>
        <price>10.90</price>
        <year>1985</year>
    </cd>
    <cd>
        <title>Hide your heart</title>
        <artist>Bonnie Tyler</artist>
        <country>UK</country>
        <company>CBS Records</company>
        <price>9.90</price>
        <year>1988</year>
    </cd>
</catalog>

Define a component with listener functions for parser events (startDocument, startElement, body, endElement, etc.):

component {
    this.cds = [];
    this.cd = {};
    this.insideCD = false;
    this.currentName = "";
    this.filter = {};
    this.removeCD = false;

/** * constructor of the component that takes the path to the XML file and a simple custom-made filter * @param xmlFile XML File to parse * @param filter filter to limit content on certain records */ function init(string xmlFile, struct filter = {}) { var xmlEventParser = createObject("java", "lucee.runtime.helpers.XMLEventParser"); this.filter = filter; // registering the event handlers xmlEventParser.init( getPageContext(), this.startDocument, this.startElement, this.body, this.endElement, this.endDocument, this.error ); xmlEventParser.start(xmlFile); return this.cds; }
/** * this function will be called on the start of parsing of an XML Element (Tag) */ function startElement(string uri, string localName, string qName, struct attributes) { if (localName EQ "cd") { this.cd = {}; this.insideCD = true; this.removeCD = false; } else if (this.insideCD) { this.currentName = localName; } }
/** * call with body of the tag */ function body(string content) { if (len(this.currentName)) { this.cd[this.currentName] = content; if (structKeyExists(this.filter, this.currentName) and content NEQ this.filter[this.currentName]) this.removeCD = true; } }
/** * this function will be called at the end of parsing an XML Element (Tag) */ function endElement(string uri, string localName, string qName, struct attributes) { if (localName EQ "cd") { if (!this.removeCD) this.cds[arrayLen(this.cds) + 1] = this.cd; this.insideCD = false; } this.currentName = ""; }
/** * this function will be called when the document starts to be parsed */ function startDocument(string uri, string localName, string qName, struct attributes) {}
/** * this function will be called when the document finishes being parsed */ function endDocument(string uri, string localName, string qName, struct attributes) {}
/** * this function will be called when an error occurs */ function error(struct cfcatch) { dump(cfcatch); } }

Invoke the component to parse and get results:

<!---
    Calls XML Catalog Event Parser and converts the data to an array of structs
    with a filter that limits the country to "USA"
--->
<cfset xmlFile = GetDirectoryFromPath(GetCurrentTemplatePath()) & 'catalog.xml'>
<cfset cds = new XMLCatalog(xmlFile, {country: 'USA'})>
<cfdump var="#cds#">

You can download the complete example here.

See also