Guides/Parsing

From J Wiki
Jump to: navigation, search

WikiPedia:Parsing is a process of analyzing character stream according to formal lexical, syntactic and/or semantic grammar, producing output structure or evaluation.


Lexical Analysis

Produces an stream of tokens from a stream of input characters. Stream can be a list. Lexing can be done using a sequential machine, regular expressions, or ad hoc splitting. AKA lexing, scanning, tokenizing.

Sequential Machine, AKA finite state machine, finite automata. Uses state transition table.

Regular Expressions internally may use sequential machine, but have intuitive standard syntax.

Ad Hoc looks for simple substrings for (iterative) splitting

  • JForum:programming/2007-January/004756
    example of ad hoc splitting for a list of first/initial/last names
  • Scripts/Scheme
    has a Lisp S-expression string tokenizer

Syntactic Analysis

Produces a structure or evaluates a stream of tokens. The structure is typically a tree of grammar elements. AKA parsing.

Bottom-up, AKA Shift-reduce. E.g., LR parsers.

Top-down, AKA Recursive descent. E.g. LL parsers.

Ad Hoc parsing which alternates splitting and combining substring portions on multiple typically non-recursive levels

  • csv script (JSvnBase:packages/files/csv.ijs)
    reads csv file into a boxed array
  • pp script
    J pretty-print script formatter
  • User:Chris Burke/Export Script utility (JSvnBase:packages/export)
    converts a script into various formats

Handling Structures

Since a lot of parsing is based on ASTs, an introduction to efficient tree handling in J would help. You might look at

See Also

J-related information

General information