From J Wiki
Jump to: navigation, search
User Guide | Development Categories Versions | SVN | Build Log

xml/loose - Loose XML parser based on regex

SAX (Simple API for XML) parser addon.
Loose XML parser is suitable for parsing HTML.
Features object oriented, SAX-like interface.

See also: examples in test folder in SVN; change history.


Use JAL/Package Manager.


SAX (Simple API for XML) is originally a Java framework by David Megginson derived from expat processing model. This paradigm results in systematically faster XML processing than DOM, as the SAX stream has a tiny memory footprint. See

SAX parsing works within the push model, i.e. the API calls you. You provide the callback functions by overriding the base class, see saxclass definition. For the XML nodes events, these functions are called on.

A higher-level visitor design pattern can be obtained if you define verbs with names of elements of interest and a prefix and call then from start/endElement. This would be similar to wd calling on event verbs.

In your class you maintain the state and selectively process the events. The event for text between tags is called characters. It is demoed in the table and rss examples.

In rss example, a simple stack of nested elements is maintained in the S list. Then characters processes the text accroding to the current context.

You can pass the result for process in the output of endDocument, which is the last event called.


These are listings and results of some examples found in the test folder.

Here J Dictionary HTML is parsed and formatted into text. As defined in dic2.ijs. Download script: dic2 example

   dicdef_pdic1_ '#.'
Base Two                           #.  1 1 1                           Base
#.y is the base-2 value of y , that      x#.y is a weighted sum of the items
is, 2#.y . For example: #. 1 0 1         of y ; that is, +/w*y , where w is
0                                        the product
10                                       scan */\.}.x,1 . An
                                         atomic argument is reshaped
   #. 2 3$ 0 0 1,1 0 1                   to the
1 5                                      shape of the other argument.
]a=: i. 3 4
0 1  2  3
4 5  6  7
8 9 10 11

   10 #.a
123 4567 9011

Typical SAX parsing. Download script: sax_test2.ijs

NB. object oriented sax parser specialization
NB. extended to use attributes and levels

require 'xml/sax/sax files'

saxclass 'psax2'

showattrs=: (''"_)`(' ' , ;:^:_1@:(([ , '='"_ , ])&.>/"1))@.(*@#)

startDocument=: 3 : 0
  L=: 0

startElement=: 4 : 0
  smoutput (L#'  '),'[',y,(showattrs attributes x),']'
  L=: L+1

endElement=: 3 : 0
  L=: L-1
  smoutput (L#'  '),'[/',y,']'

Download script: result

TEST1=: 0 : 0
<root><test a="11"/><test b="12"/></root>

   process_psax2_ TEST1
  [test a=11]
  [test b=12]

   process_psax2_ fread jpath '~addons/xml/sax/test/chess.xml'
      [position column=g row=1]
      [position column=d row=6]
      [position column=b row=6]

See Also