Read XML with a listener Model (SAX)

Read XML with a listener Model (SAX)

Lucee not only allows you to convert an XML file to an object tree (DOM) but also supports an event-driven model (SAX).

The function XMLParse is handy to get an object representation of a complete XML document. However, for large XML documents, this can cause memory issues. This method is an overhead if you simply need to read some data from an XML file and convert it to something else. For this, the SAX event-driven model is a very handy and lightweight way to do this. Here is an example.

Let's say we want to read in the following XML document:

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
    <cd>
        <title>Empire Burlesque</title>
        <artist>Bob Dylan</artist>
        <country>USA</country>
        <company>Columbia</company>
        <price>10.90</price>
        <year>1985</year>
    </cd>
    <cd>
        <title>Hide your heart</title>
        <artist>Bonnie Tyler</artist>
        <country>UK</country>
        <company>CBS Records</company>
        <price>9.90</price>
        <year>1988</year>
    </cd>
</catalog>

To read this, we need to define a component that looks like the following, and you need to add functions that are listening to certain events of the XML parser (startDocument, startElement, body, endElements, ...). It is completely up to your code to store the data for later use.

component {
    this.cds = [];
    this.cd = {};
    this.insideCD = false;
    this.currentName = "";
    this.filter = {};
    this.removeCD = false;
<span class="o">/**</span>
<span class="o">*</span> <span class="nv">constructor</span> <span class="nv">of</span> <span class="nv">the</span> <span class="nv">component</span> <span class="nv">that</span> <span class="nv">takes</span> <span class="nv">the</span> <span class="nv">path</span> <span class="nv">to</span> <span class="nv">the</span> <span class="nv">XML</span> <span class="nv">file</span> <span class="o">and</span> <span class="nv">a</span> <span class="nv">simple</span> <span class="nv">custom</span><span class="o">-</span><span class="nv">made</span> <span class="nv">filter</span>
<span class="o">*</span> <span class="err">@</span><span class="nv">param</span> <span class="nv">xmlFile</span> <span class="nv">XML</span> <span class="nv">File</span> <span class="nv">to</span> <span class="nv">parse</span>
<span class="o">*</span> <span class="err">@</span><span class="nv">param</span> <span class="nv">filter</span> <span class="nv">filter</span> <span class="nv">to</span> <span class="nv">limit</span> <span class="nv">content</span> <span class="nv">on</span> <span class="nv">certain</span> <span class="nv">records</span>
<span class="o">*/</span>
<span class="nv">function</span> <span class="nf">init</span><span class="p">(</span><span class="nv">string</span> <span class="nv">xmlFile</span><span class="p">,</span> <span class="nv">struct</span> <span class="nv">filter</span> <span class="o">=</span> <span class="p">{})</span> <span class="p">{</span>
    <span class="k">var</span> <span class="nv">xmlEventParser</span> <span class="o">=</span> <span class="nf">createObject</span><span class="p">(</span><span class="s2">&quot;java&quot;</span><span class="p">,</span> <span class="s2">&quot;lucee.runtime.helpers.XMLEventParser&quot;</span><span class="p">);</span>
    <span class="nv">this.filter</span> <span class="o">=</span> <span class="nv">filter</span><span class="p">;</span>
    <span class="c">// registering the event handlers</span>
    <span class="nf">xmlEventParser.init</span><span class="p">(</span>
        <span class="nf">getPageContext</span><span class="p">(),</span>
        <span class="nv">this.startDocument</span><span class="p">,</span>
        <span class="nv">this.startElement</span><span class="p">,</span>
        <span class="nv">this.body</span><span class="p">,</span>
        <span class="nv">this.endElement</span><span class="p">,</span>
        <span class="nv">this.endDocument</span><span class="p">,</span>
        <span class="nv">this.error</span>
    <span class="p">);</span>
    <span class="nf">xmlEventParser.start</span><span class="p">(</span><span class="nv">xmlFile</span><span class="p">);</span>
    <span class="nv">return</span> <span class="nv">this.cds</span><span class="p">;</span>
<span class="p">}</span>
<span class="o">/**</span>
<span class="o">*</span> <span class="nv">this</span> <span class="nv">function</span> <span class="nv">will</span> <span class="nv">be</span> <span class="nv">called</span> <span class="nv">on</span> <span class="nv">the</span> <span class="nv">start</span> <span class="nv">of</span> <span class="nv">parsing</span> <span class="nv">of</span> <span class="nv">an</span> <span class="nv">XML</span> <span class="nf">Element</span> <span class="p">(</span><span class="nv">Tag</span><span class="p">)</span>
<span class="o">*/</span>
<span class="nv">function</span> <span class="nf">startElement</span><span class="p">(</span><span class="nv">string</span> <span class="nv">uri</span><span class="p">,</span> <span class="nv">string</span> <span class="nv">localName</span><span class="p">,</span> <span class="nv">string</span> <span class="nv">qName</span><span class="p">,</span> <span class="nv">struct</span> <span class="nv">attributes</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="nv">localName</span> <span class="o">EQ</span> <span class="s2">&quot;cd&quot;</span><span class="p">)</span> <span class="p">{</span>
        <span class="nv">this.cd</span> <span class="o">=</span> <span class="p">{};</span>
        <span class="nv">this.insideCD</span> <span class="o">=</span> <span class="nv">true</span><span class="p">;</span>
        <span class="nv">this.removeCD</span> <span class="o">=</span> <span class="nv">false</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nv">this.insideCD</span><span class="p">)</span> <span class="p">{</span>
        <span class="nv">this.currentName</span> <span class="o">=</span> <span class="nv">localName</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
<span class="o">/**</span>
<span class="o">*</span> <span class="nv">call</span> <span class="nv">with</span> <span class="nv">body</span> <span class="nv">of</span> <span class="nv">the</span> <span class="nv">tag</span>
<span class="o">*/</span>
<span class="nv">function</span> <span class="nf">body</span><span class="p">(</span><span class="nv">string</span> <span class="nv">content</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="k">len</span><span class="p">(</span><span class="nv">this.currentName</span><span class="p">))</span> <span class="p">{</span>
        <span class="nv">this.cd</span><span class="p">[</span><span class="nv">this.currentName</span><span class="p">]</span> <span class="o">=</span> <span class="nv">content</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="nf">structKeyExists</span><span class="p">(</span><span class="nv">this.filter</span><span class="p">,</span> <span class="nv">this.currentName</span><span class="p">)</span> <span class="o">and</span> <span class="nv">content</span> <span class="nv">NEQ</span> <span class="nv">this.filter</span><span class="p">[</span><span class="nv">this.currentName</span><span class="p">])</span>
            <span class="nv">this.removeCD</span> <span class="o">=</span> <span class="nv">true</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span>
<span class="o">/**</span>
<span class="o">*</span> <span class="nv">this</span> <span class="nv">function</span> <span class="nv">will</span> <span class="nv">be</span> <span class="nv">called</span> <span class="nv">at</span> <span class="nv">the</span> <span class="nv">end</span> <span class="nv">of</span> <span class="nv">parsing</span> <span class="nv">an</span> <span class="nv">XML</span> <span class="nf">Element</span> <span class="p">(</span><span class="nv">Tag</span><span class="p">)</span>
<span class="o">*/</span>
<span class="nv">function</span> <span class="nf">endElement</span><span class="p">(</span><span class="nv">string</span> <span class="nv">uri</span><span class="p">,</span> <span class="nv">string</span> <span class="nv">localName</span><span class="p">,</span> <span class="nv">string</span> <span class="nv">qName</span><span class="p">,</span> <span class="nv">struct</span> <span class="nv">attributes</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="nv">localName</span> <span class="o">EQ</span> <span class="s2">&quot;cd&quot;</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nv">this.removeCD</span><span class="p">)</span>
            <span class="nv">this.cds</span><span class="p">[</span><span class="nf">arrayLen</span><span class="p">(</span><span class="nv">this.cds</span><span class="p">)</span> <span class="o">+</span> <span class="m">1</span><span class="p">]</span> <span class="o">=</span> <span class="nv">this.cd</span><span class="p">;</span>
        <span class="nv">this.insideCD</span> <span class="o">=</span> <span class="nv">false</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="nv">this.currentName</span> <span class="o">=</span> <span class="s2">&quot;&quot;</span><span class="p">;</span>
<span class="p">}</span>
<span class="o">/**</span>
<span class="o">*</span> <span class="nv">this</span> <span class="nv">function</span> <span class="nv">will</span> <span class="nv">be</span> <span class="nv">called</span> <span class="nv">when</span> <span class="nv">the</span> <span class="nv">document</span> <span class="nv">starts</span> <span class="nv">to</span> <span class="nv">be</span> <span class="nv">parsed</span>
<span class="o">*/</span>
<span class="nv">function</span> <span class="nf">startDocument</span><span class="p">(</span><span class="nv">string</span> <span class="nv">uri</span><span class="p">,</span> <span class="nv">string</span> <span class="nv">localName</span><span class="p">,</span> <span class="nv">string</span> <span class="nv">qName</span><span class="p">,</span> <span class="nv">struct</span> <span class="nv">attributes</span><span class="p">)</span> <span class="p">{}</span>
<span class="o">/**</span>
<span class="o">*</span> <span class="nv">this</span> <span class="nv">function</span> <span class="nv">will</span> <span class="nv">be</span> <span class="nv">called</span> <span class="nv">when</span> <span class="nv">the</span> <span class="nv">document</span> <span class="nv">finishes</span> <span class="nv">being</span> <span class="nv">parsed</span>
<span class="o">*/</span>
<span class="nv">function</span> <span class="nf">endDocument</span><span class="p">(</span><span class="nv">string</span> <span class="nv">uri</span><span class="p">,</span> <span class="nv">string</span> <span class="nv">localName</span><span class="p">,</span> <span class="nv">string</span> <span class="nv">qName</span><span class="p">,</span> <span class="nv">struct</span> <span class="nv">attributes</span><span class="p">)</span> <span class="p">{}</span>
<span class="o">/**</span>
<span class="o">*</span> <span class="nv">this</span> <span class="nv">function</span> <span class="nv">will</span> <span class="nv">be</span> <span class="nv">called</span> <span class="nv">when</span> <span class="nv">an</span> <span class="nv">error</span> <span class="nv">occurs</span>
<span class="o">*/</span>
<span class="nv">function</span> <span class="nf">error</span><span class="p">(</span><span class="nv">struct</span> <span class="nv">cfcatch</span><span class="p">)</span> <span class="p">{</span>
    <span class="nf">dump</span><span class="p">(</span><span class="nv">cfcatch</span><span class="p">);</span>
<span class="p">}</span>

}

Now we simply can invoke that component to parse the XML file and get the result as an array of structs:

<!---
    Calls XML Catalog Event Parser and converts the data to an array of structs
    with a filter that limits the country to "USA"
--->
<cfset xmlFile = GetDirectoryFromPath(GetCurrentTemplatePath()) & 'catalog.xml'>
<cfset cds = new XMLCatalog(xmlFile, {country: 'USA'})>
<cfdump var="#cds#">

You can download the complete example here.

See also