User:Dan Bron/Temp/ParseLexExecute

From J Wiki
Jump to navigation Jump to search

Parse, Lex, Execute: Or, in the parlance of J: Rhematics, Syntax (Grammar?), Semantics. Or "Exejesis".

Please don't edit this page for now, I want to lay out the idea more clearly. -- Dan Bron <<DateTime(2005-10-24T17:52:13Z)>>

TODO:

Full set of words like lexing parsing rehmatics semantics etc. Easier to search that way. Combine all the related pages (J intenral types, primitive primitives, name formats, etc) in a subbranch.

Pitfalls: Seperate page for common pitfalls (shape not displayed, type not neccesarily displayed, mapped names...)

Another Implementation of J

This page the root for implementing J in J. The title was derived from Roger's document "An Implementation of J", which describes the implementation of the Dictionary (hereafter called the DoJ) in C.

Its purpose is primarily pedagogical: it will give many entry points to learning J, all of which lead to bootstrapping paths to learning the entire system. Also it will document certain features not obvious or readily derivable from the Dictionary (hereafter called the DoJ).

Rhematics

See #lexing

Parsing

Sublanguages:

    • Names
    • Constants

Based constants: 1. Assuming a single, well formed, based constant, the evaluation is:

   split  =. (  (}.~ >:) ; {.~  )  i.&'b'
   digits =. '0123456789abcdefghijklmnopqrstuvwxyz'&i.&.>  @:  split
   baseN  =. #.~&:>/  @:  ,&(<10)  @:  digits

   x =. '16bffff'
   require'regex'
   assert '[0-9]+b[0-9a-z]+' rxeq x  NB.  Single, well formed, based constant
   baseN x
65535
 *  Explicit code (controls)

Lexing

Semantics

Interpretation

Stack

Execution

Internal: PrimitivePrimitives

External: Foriegns

Other

J data: InternalTypes

Lexing

J is, at the top level:

    • Lines, mostly. Lines are:
  • Immediate execution. This includes immediately executable code, empty lines, and the introduction of line oriented explicit context (LOEC).
  • When LOEC has been introduced, lines can be:
   *  Explicit code (latent execution).
   *  Control words
   *  Change of explicit context ( :  on a line by itself -- neither preceded nor followed by any other characters, including whitespace)
   *  End of explicit context  ( )  on a line by itself -- neither preceded nor followed by any other characters, including whitespace)
*  Immediately executable code lines are:
*  Code
   *  Names
      *  Fixed
      *  Mutable (mapped/nonmapped)
      *  Have classes
         * Operators
            *  Adverbs, which have only a right argument (which is either a function or data).
            *  Conjunctions, which have both a left and a right argument (each argument is either a function or data).
            *  There are no ambivalent operators.
         *  Functions
            *  Verbs, which can be either:
               *  Monads, having only a right argument, which is data
               *  Dyads, having both a left and a right argument, which are both data
               *  Ambivalent, which can act as either a monad or a dyad.
         *  Data
            *  Rank, rectangular
            *  Homogenous
            *  Boxed or open
                *  Make heterogenous into homogenous
                *  Trees
                *  Pointers
                *  Non rectangular data
            *  Sparse or dense
            *  Numeric
               *  Boolean
               *  Integer
               *  Extended integer (arbitrarily big integers)
               *  Rational (exact fractions)
               *  Floating point
               *  Complex
            *  Character
               *  Literal
               *  Unicode
            *  Symbol
            *  Promotion
            *  No demotion
               *  Wrong datatype will bite you.  (Display again)
   *  Parens
   *  Copulae
*  Followed optionally by a comment.  A comment starts at the leftmost instance of NB. with an even number (including 0) of quotes to its left.

J has fixed names for an infinite number of both literal (ASCII) data and numeric data, but only one fixed name for boxed data (a:) and no fixed named for unicode or symbolic data. Thus, most boxed, unicode, and symbolic data must be calculated/generated.

J has fixed names for numeric and literal rank 1 arrays of length > 1 (within limits, names have a maximum length of 1024 characters), and fixed names for numeric and literal (ASCII) scalars (1, 2, 3, 'a', 'b', 'c', etc) but no fixed names for arrays with greater rank, or, __in particular__ rank 1 arrays of length one. That is, if you enter a single number or a single character between quotes, it is a SCALAR. Example:

    $ 1 2 3 4
4
    $ 1 2 3
3
    $ 1 2
2
    $ 1

    #@:$&> (1 2 3 4);(1 2 3);(1 2);(1)
1 1 1 0

    NB.  Note that the scalar  1  and the vector   ,1  are
    NB.  DISPLAYED the same, but they are DIFFERENT data.

    NB.  This WILL bite you.

   <;._2 'a ab abc abcd '
+-+--+---+----+
|a|ab|abc|abcd|
+-+--+---+----+

   ('a';'ab';'abc';'abcd')
+-+--+---+----+
|a|ab|abc|abcd|
+-+--+---+----+

   ('a';'ab';'abc';'abcd') -: <;._2 'a ab abc abcd '
0
   NB. What?  They look the same!

   $&.> <;._2 'a ab abc abcd '
+-+-+-+-+
|1|2|3|4|
+-+-+-+-+
   $&.>  'a';'ab';'abc';'abcd'
++-+-+-+
||2|3|4|
++-+-+-+
   NB. Ah!  It's a shape issue.


   NB.  However, J is a very consistent language, so you'll likely never get
   NB.  scalars mixed with vectors when applying a verb to data.

   NB.  No matter how simple ...
   (;: 'a ab abc abcd ') -: <;._2 'a ab abc abcd '
1
   (<\ 'abcd')  -: <;._2 'a ab abc abcd '
1
   ((>:@:i.@:# {.&.> <) 'abcd') -: <;._2 'a ab abc abcd '
1
   NB.  ... or how complex your verb.
   ((<@:}:;.1~ (e.~ i.@:{:)@:(+/\)@:(* * >:)@:i.@:>:@:4:) 'a ab abc abcd ') -:   <;._2 'a ab abc abcd '
1