From J Wiki
Jump to: navigation, search


An Implementation of J

Memory Management
Global Variables


The fundamental data structure is the array, that is, an object of the C data type A defined in file jtype.h:

typedef long I; typedef struct {I k,flag,m,t,c,n,r,s[1];}* A;

All objects, whether numeric, literal, or boxed, whether noun, verb, adverb, or conjunction, are represented by arrays. For example, the string 'Cogito, ergo sum.', the atom 1.61803, and the table 11+i.3 4 are represented thus:

     k   flag  m    t    c    n    r   s[0]
   │  32│   0│  20│CHAR│   2│  17│   1│  17│Cogi│to, │ergo│ sum│.   │

     k   flag  m    t    c    n    r
   │  28│   0│   8│  FL│   2│   1│   0│  1.61803│

     k   flag  m    t    c    n    r   s[0] s[1]
   │  36│   0│  48│ INT│   2│  12│   2│   3│   4│  11│  12│  13│
                       │  14│  15│  16│  17│  18│  19│  20│  21│  22│
The parts of an array, and macros for manipulating them, are as follows:

    	Part          	Macro       	      Description
k	AK	offset of ravel with respect to byte 0 of the array
flag	AFLAG	flag
m	AM	maximum # of bytes in ravel
t	AT	type
c	AC	reference count
n	AN	# of atoms in ravel
r	AR	rank
s	AS	pointer to shape
        AV	"ravel" or "value", pointer to atoms in ravelled order

An array has a "header" and a "value". The header are the parts k, flag, m, t, and so forth, including the shape s, which consists of r integers whose product equals n. The value, the atoms of the array in ravelled (row major) order, usually follow immediately after s, but can be separate from the header, according to the value in the k part. Setting the parts of an array incorrectly, or exceeding the bounds of the array specified by these parts, almost always lead to erratic behaviour and catastrophic failure.

The macros AK, AFLAG, AM, AT, AC, AN, and AR denote "fullword" integers and may occur on the left or right of an assignment (i.e. they are "lvalues".) AS is an integer pointer. AV is also an integer pointer, and must be cast to a C data type appropriate to the type of array. (See Types.)

All arrays are created using the macro GA in file j.h. The statement


creates an array named xyz of type t and rank r, having n atoms and shape s. If the rank is 0, s is ignored; if the rank is 1, again s is ignored, and the shape is set to n. Otherwise, if s is nonzero, GA initializes the shape from the r integers pointed to by s, and if s is 0, the shape is not initialized and must be initialized subsequently. GA returns zero if the array can not be created.

For example, the arrays diagrammed above can be created as follows, under the names ces, phi, and m:

   typedef char   C;
   typedef double D;

   A ces,m,phi; I j,*s,*v;

   memcpy((C*)AV(ces),"Cogito, ergo sum.",(size_t)17);


   s=AS(m); *s=3; *(1+s)=4;
   v=AV(m); for(j=11;23>j;++j)*v++=j;

The following utilities (file u.c) and array constants (file i.c) are convenient for working with simple arrays. The frequency of use gives a sense of their utility.

   Facility	      Freq.     Description

   sc(I k)	       217 	An integer atom with value k (equivalent to sc4(INT,k))
   sc4(I t,I k)                 5An atom of type t with 4-byte value k
   scf(D x)	        26	A floating point atom with value x
   scc(C x)             17	A literal atom with value c
   apv(I n,I b,I m)     96	The arithmetic progression vector b+m*i.n
   str(I n,C*s)	        52	A string (literal list) of length n with value the characters pointed to by s
   cstr(C*s)	        93	A string with value the characters in the 0-terminated string s
   v2(I a,I b)	        72	The integer vector a,b
   v1(I k)	        15	The integer vector ,k
   vec(I t,I n,void*v) 	81	A vector of length n of type t, with values pointed to by v

   zero                129	0
   one	                98	1
   two	                24	2
   neg1 	        22	_1
   pie  	         5	1p1 (pi conflicts with C usage)
   a0j1 	         9	0j1
   ainf 	        14	_
   iv0  	        17	,2-2
   iv1  	        18	,2-1
   mtv  	        73	i.0
   mtm  	        44	i.0 0

For example, the arrays diagrammed above can be created by str(17L,"Cogito, ergo sum.") or cstr("Cogito, ergo sum."), scf(1.61803), and reshape(v2(3L,4L),apv(12L,11L,1L)).


If x is an array, its type AT(x) specifies how the atoms starting at AV(x) are to be interpreted. In C programming terms, AV(x) must be cast to a pointer of the appropriate C type:

           C Data
   AT(x)    Type                    Description

   B01       B          Boolean (BOOL has a name conflict)
   LIT       C          literal (character; CHAR has a name conflict)
   INT       I          integer
   FL        D          double (IEEE floating point)
   CMPX      Z          complex
   BOX       A          boxed
   XNUM      X          extended precision integer
   RAT       Q          rational number

   SB01      P          sparse boolean
   SLIT      P          sparse literal (character)
   SINT      P          sparse integer
   SFL       P          sparse floating point
   SCMPX     P          sparse complex
   SBOX      P          sparse boxed

   VERB      V          verb
   ADV       V          adverb
   CONJ      V          conjunction

   ASGN      I          assignment
   MARK      I          marker
   SYMB      L          locale (symbol table)
   CONW      CW         control word
   NAME      NM         name
   LPAR      I          left  parenthesis
   RPAR      I          right parenthesis

For example, if x is literal and s=(C*)AV(x), then s[i] is character i of x. The C data types in the table are all typedef's found in file jtype.h; the data type V is explained in the Verbs section.

Types are fullword integers, and are powers of 2 to permit convenient tests for "composite" types. For example, if:


Then the phrase NUMERIC&AT(x) tests for numeric arrays, and the phrase NOUN&AT(x) tests for nouns. Such comparisons play a key role in the parser.

A numeric array is accepted as argument by a primitive, regardless of its type, if it is mathematically within the domain of the primitive. For example, a primitive with integral domain would accept integers in an array of type FL, CMPX, and B01, and of course INT. (This analytic property does not extend to functions internal to the implementation.) Functions in the file k.c convert between numeric types. A converted result is an array of the target type equal to the argument within fuzz. The following functions are available:

   	cvt(t,x)    	Convert x to type t; signal error if not possible
       pcvt(t,x)	Convert x to type t; return x if not possible
       icvt(t,x)	Convert floating x to INT if the values are in range; otherwise just return x
       bcvt(t,x)	Convert x to the "lowest" type

The utility bp in file u.c applies to a type, and returns the number of bytes per atom of that type. Thus bp(INT) is 4; bp(AT(x)) is the number of bytes per atom of x; and 28+(4*AR(x))+AN(x)*bp(AT(x)) is the number of bytes required by x — 4 bytes each for k,flag,m,t,c,n,r; 4 bytes each for the AR(x) elements of the shape; and bp(AT(x)) bytes each for AN(x) atoms.

The atoms of a boxed array are pointers to other arrays, and are accessible through (A*)AV(x), as the following example illustrates. aib applies to a boxed array x, and returns the number of atoms in each box of x:

   #define R  return

   A aib(A x){A*u,z;I j,*v;
    GA(z,INT,AN(x),AR(x),AS(x));       /* 1 */
    u=(A*)AV(x); v=AV(z);              /* 2 */
    for(j=0;AN(x)>j;++j)*v++=AN(*u++); /* 3 */
    R z;

Line 1 creates an integer array z having the same rank and shape as x. Line 2 initializes pointer values u and v for traversing x and z . Line 3 runs through the atoms of x, through u, and records the number of atoms in each. Since the data type of u is A*, the data type of *u is A and are subject to AN, AT, AV, etc.

Memory Management

When an array is created, malloc is called to obtain the requisite storage; when this storage is no longer needed, free is called to return it to the underlying system. No "garbage collection" is done. The performance of this strategy is adequate on modern virtual memory systems. To facilitate the implementation of alternative strategies, the use of malloc and free is limited to a single instance of each, in the file m.c.

The reference count of an array is incremented when it is assigned a name, directly or indirectly, and is decremented when the name is reassigned or erased; when the reference count of an array reaches 0, its storage is freed.

When an array is created, a pointer to it is entered in a "temp stack" (tstack in file m.c.) A temp is an array on this stack with a reference count of one. The temp stack plays an important role in the main execution loop. In an iteration of the loop,

• The top of the temp stack is recorded;
• A line of user-input is executed; and
• Temps from the current top-of-stack to the old top-of-stack recorded above, are freed.

This device permits functions to be written without explicit memory management code. For example, the monad ~. is written:

F1(jtnub){R repeat(nubsieve(w),w);}

And nub need not be concerned with temps used in repeat or nubsieve, because they are accounted for in the main loop.

On the other hand, a function may account for temps: On entry into the function, the current top-of-stack is recorded; on exit, temps are freed down to the recorded point. (These actions are mediated by the macros PROLOG and EPILOG.) Whether a function accounts for temps does not affect the logic of functions that it calls, nor functions that call it.

Global Variables

The only global variables used in the system are constants which are assigned exactly once. (For example, the array constant zero and the internal complex number zeroZ.) All other variables non-local to functions are accessed through the parameter jt.

jt has defined type J, a pointer to a struct defined in file jt.h. Nearly all functions in the system has jt as its first function argument, and all such functions have the letters jt as a prefix in their names. The file ja.h defines aliases for these names, so that a call to a function jtxyz(jt,a,w,h) is actually written as xyz(a,w,h). For example, the conjunction &, described in Adverbs and Conjunctions, is implemented by a function defined and declared as jtamp, having prototype A jtamp(J jt,A a,A w), but calls to this function are written as amp(a,w), and discussions on this function refer to it as amp.

jt makes it possible to execute multiple instances of J in the same process.

Next • Previous • Index • Table of Contents