WikiPedia:Parsing is a process of analyzing character stream according to formal lexical, syntactic and/or semantic grammar, producing output structure or evaluation.
Produces an stream of tokens from a stream of input characters. Stream can be a list. Lexing can be done using a sequential machine, regular expressions, or ad hoc splitting. AKA lexing, scanning, tokenizing.
Sequential Machine, AKA finite state machine, finite automata. Uses state transition table.
- dyad ;: Sequential Machine
J implementation with an example of J lexer for Alphabet and Words
- Essays/Word Formation on Lines
Sequential machine for J words with space and line tokens with extensive examples
stripping out unnecessary content from the files to reduce file size (comments, etc).
HTTP header lexer using ;: dyad, and elements of ad hoc parising
visualizing sequential machines using transition diagrams
JSON style backslash evaluator
JSON tokenizer, with details of producing the sequential machine transition table
Regular Expressions internally may use sequential machine, but have intuitive standard syntax.
- Regular Expressions Lab
Guide to regex library
- Essays/Regex Lexer
a lexer based on standard regular expressions and simple token declarations
- Scripts/Regular Expressions Substitution
Regular expressions extended for Perl/awk/sed-like substitution
Ad Hoc looks for simple substrings for (iterative) splitting
example of ad hoc splitting for a list of first/initial/last names
has a Lisp S-expression string tokenizer
Produces a structure or evaluates a stream of tokens. The structure is typically a tree of grammar elements. AKA parsing.
Bottom-up, AKA Shift-reduce. E.g., LR parsers.
- Parsing and Execution from J Dictionary, Roger Hui, Kenneth Iverson
- Parsing and Execution from J for C Programmers, Henry Rich
- trace script (JSvnBase:packages/misc/trace.ijs)
provides a model of the J parser whose internal workings can be examined and experimented with
JSON shift-reduce parser
Top-down, AKA Recursive descent. E.g. LL parsers.
- Essays/Recursive Descent Parser
framework for simple building of hand-coded LL parsers using Regex Lexer
has a tacit recursive-descent parser
Ad Hoc parsing which alternates splitting and combining substring portions on multiple typically non-recursive levels
- csv script (JSvnBase:packages/files/csv.ijs)
reads csv file into a boxed array
- pp script
J pretty-print script formatter
- User:Chris Burke/Export Script utility (JSvnBase:packages/export)
converts a script into various formats
Since a lot of parsing is based on ASTs, an introduction to efficient tree handling in J would help. You might look at
- the lab Huffman Coding
- Roger's Essays/Huffman Coding
- Guides/Strings string and text manipulation resources
- JForum:programming/2007-November/008869 some initial links
- User:Dan Bron/Temp/ParseLexExecute implementing J in J
- Guides/Language FAQ/J BNF Is there a BNF description of J?
- JForum:chat/2007-November/000678 J syntax easy to parse? I don't think so
- using JHP for general templating
- WikiPedia:Parsing, Wikipedia