Essays/XMLwithAmpersand

From J Wiki
Jump to navigation Jump to search

The SAX XML parser exhibits annoying, though acceptable behavior when working on a text node containing an embedded ampersand character, designated by &. This behavior, that there may be multiple callbacks from a "characters" callback, is documented on this page, which, ironically, is rendered very poorly.

Here's a method for dealing with this, adapted from some example code written by Oleg.

NB.* xmlEGboxed.ijs: use elements or attributes to fill boxed table
NB. http://www.jsoftware.com/pipermail/programming/2008-December/013300.html

require 'xml/sax format'

saxclass 'pboxed'

startDocument=: 3 : 0
   LASTL=: L=: 0 [ S=: ''     NB. Level counter L, leading paths S.
   HREF=: ''                  NB. Stores attributes to get HREFs.
   Z=: i.0 2                  NB. Will contain final result.
)
endDocument=: 3 : 'Z'

startElement=: 4 : 0
   L=: >:L [ S=: S,<y
   if. y-:'bookmark' do.
       HREF=: x getAttribute 'href' end.
)

endElement=: 3 : 0
   L=: <:L [ S=: }:S
)

characters=: 3 : 0
   s2=. _2{.S
   if. s2 -: ;:'bookmark title' do.
       if. L~:LASTL do. Z=: Z,y;HREF              NB. Either initialize or
       else. Z=: (<y,~>(<_1 0){Z) (<_1 0)}Z end.  NB.  accumulate more.
   end.
   LASTL=: L
)

NB. =========================================================
cocurrent 'base'

This code is designed to accumulate bookmarked URLs with their corresponding titles.

Here's some sample XML with embedded ampersands.

egSmall=: 0 : 0
<?xml version="1.0"?>
<!DOCTYPE xbel PUBLIC "+//IDN python.org//DTD XML Bookmark Exchange Language 1.1//EN//XML" "http://pyxml.sourceforge.net/topics/dtds/xbel-1.1.dtd">
<xbel>
  <title>Bookmarks</title>
  <desc>Bookmarks</desc>
  <folder id="rdf:#$FvPhC3" folded="no">
    <title>Bookmarks Toolbar Folder</title>
    <desc>Add bookmarks to this folder & see them displayed on the Bookmarks Toolbar
    </desc>
    <bookmark href="http://www.bogus.org/HeyHo/LetsGo.html">
      <title>Getting Started & Then Some</title>
    </bookmark>
    <bookmark href="http://fxfeeds.mozilla.com/" modified="1209052290">
      <title>Headlines & Deadlines</title>
    </bookmark>
  </folder>
    <bookmark href="http://www.jsoftware.com/" added="1146880810" visited="1209017433">
      <title>J Home & Homeboys</title>
    </bookmark>
</xbel>
)

Here's the result of using the code on this example:

   load 'xmlEGBoxed.ijs'
   process_pboxed_ egSmall
+---------------------------+--------------------------------------+
|Getting Started & Then Some|http://www.bogus.org/HeyHo/LetsGo.html|
+---------------------------+--------------------------------------+
|Headlines & Deadlines      |http://fxfeeds.mozilla.com/           |
+---------------------------+--------------------------------------+
|J Home & Homeboys          |http://www.jsoftware.com/             |
+---------------------------+--------------------------------------+