User:Ian Clark/credo

From J Wiki
Jump to navigation Jump to search

Artificial Catholicism: a case study in translating from BASIC to J

In the following thread on the J programming forum:

http://www.jsoftware.com/pipermail/programming/2012-November/030164.html

Bo Jacoby challenges us to make sense of a heritage BASIC program which generates doctrinally sound prayers from a database of the Nicene Creed in Latin:

http://en.wikipedia.org/wiki/Nicene_Creed#Latin_liturgical_version

The database stores each word of the Creed as the node of a tree. "Doctrinally sound prayers" correspond to traversals of this tree, or a subtree.

This article is not concerned with the devotional questions surrounding automatic prayer generation, but with offering a case-study of analysing a given BASIC program in order to implement it intelligently in J.

The BASIC program in question is interesting for its technique of defining a tree on a set of records by attaching a single integer to each record. A given traversal of the tree or a subtree has an associated integer, and conversely any positive integer defines a traversal (albeit often a trivial one). The program loops indefinitely, accepting integers typed-in at the keyboard, and outputting the corresponding prayer on the same line.

A typical session looks like this

1:CREDO AMEN
12:CREDO IN JESUM AMEN
10:CREDO IN DEUM ET IN JESUM ET IN SPIRITUM ET ECCLESIAM AMEN
2:CONFITEOR AMEN
20:CONFITEOR BAPTISMA AMEN
200:CONFITEOR UNUM BAPTISMA IN REMISSIONEM AMEN
201:CONFITEOR UNUM BAPTISMA AMEN
210:CONFITEOR UNUM BAPTISMA IN REMISSIONEM AMEN
211:CONFITEOR UNUM BAPTISMA AMEN

the substring before the colon (:) being the integer typed-in.

The data file "CREDO"

We reproduce Bo's database of fundamental Catholic doctrine below (you can copy/paste it into an empty IJS window):

1 CREDO
11 IN
111 UNUM
11 DEUM
112 PATREM
1121 OMNIPOTENTEM
113 FACTOREM
1131 CÆLI
1139 ET
1132 TERRÆ
11331 VISIBILIUM
1133 OMNIUM
11339 ET
11332 INVISIBILIUM
19 ET
12 IN
1211 UNUM
1211 DOMINUM
12 JESUM
1211 CHRISTUM
1212 FILIUM
1212 DEI
12121 UNIGENITUM
1219 ET
1213 EX
1213 PATRE
1213 NATUM
12131 ANTE
121311 OMNIA
12131 SÆCULA
1221 DEUM
12211 DE
12211 DEO
1222 LUMEN
12221 DE
12221 LUMINE
1223 DEUM
12231 VERUM
12232 DE
12232 DEO
122321 VERO
1231 GENITUM
12311 NON
12311 FACTUM
1232 CONSUBSTANTIALEM
1232 PATRI
12321 PER
12321 QUEM
12321 OMNIA
12321 FACTA
12321 SUNT
124 QUI
124101 PROPTER
124101 NOS
12410101 HOMINES
124109 ET
124102 PROPTER
12410201 NOSTRAM
124102 SALUTEM
12411 DESCENDIT
1241101 DE
1241101 CÆLIS
12419 ET
12412 INCARNATUS
12412 EST
1241201 DE
1241201 SPIRITU
124120101 SANCTO
1241202 EX
1241202 MARIA
124120201 VIRGINE
12419 ET
1241301 HOMO
12413 FACTUS
12413 EST
124211 CRUCIFIXUS
1242101 ETIAM
1242101 PRO
1242101 NOBIS
1242102 SUB
1242102 PONTIO
1242102 PILATO
124212 PASSUS
124219 ET
124213 SEPULTUS
12421 EST
12429 ET
12422 RESURREXIT
124221 TERTIA
124221 DIE
124222 SECUNDUM
124222 SCRIPTURAS
12429 ET
12423 ASCENDIT
124231 IN
124231 CÆLUM
12424 SEDET
124241 AD
124241 DEXTERAM
124241 PATRIS
12429 ET
124251 ITERUM
12425 VENTURUS
12425 EST
124252 CUM
124252 GLORIA
124253 JUDICARE
1242531 VIVOS
1242539 ET
1242532 MORTUOS
125 CUJUS
125 REGNI
125 NON
125 ERIT
125 FINIS
19 ET
13 IN
13 SPIRITUM
131 SANCTUM
132 DOMINUM
139 ET
133 VIVIFICANTEM
134 QUI
134 EX
1341 PATRE
1342 FILIO
1349 QUE
134 PROCEDIT
135 QUI
135 CUM
13501 PATRE
13509 ET
13502 FILIO
13509 SIMUL
1351 ADORATUR
1359 ET
1352 GLORIFICATUR
136 QUI
136 LOCUTUS EST
1361 PER
1361 PROPHETAS
19 ET
141 UNAM
142 SANCTAM
143 CATHOLICAM
149 ET
144 APOSTOLICAM
14 ECCLESIAM
2 CONFITEOR
211 UNUM
21 BAPTISMA
212 IN
212 REMISSIONEM
2121 PECCATORUM
9 ET
3 EXPECTO
31 RESURRECTIONEM
311 MORTUORUM
39 ET
32 VITAM
3211 VENTURI
321 SÆCULI
AMEN

To facilitate loading the data, let's make the above into a J script by prepending a single header line, like so

CREDO=: (<;._2) 0 : 0
1 CREDO
11 IN
111 UNUM
11 DEUM
...
39 ET
32 VITAM
3211 VENTURI
321 SÆCULI
AMEN
)

Note that the final line consisting of a bare right-parenthesis is redundant if it's the last line of the script.

We can either save the data as a separate J script, or embed it into the application script which is going to use it.

When loaded, the above script will create a noun CREDO - a boxed list of records like this:

   CREDO
┌───────┬─────┬────────┬───────┬──────────┬─────────────────┬────────────┬...
│1 CREDO│11 IN│111 UNUM│11 DEUM│112 PATREM│1121 OMNIPOTENTEM│113 FACTOREM│...
└───────┴─────┴────────┴───────┴──────────┴─────────────────┴────────────┴...
   #CREDO
163

The program to generate doctrinally sound prayers

Bo offers a heritage BASIC program which reads the data in its original bare form (saved as a text file named "CREDO") and composes a "prayer" (my term for it) when the user types in a number. The program loops, accepting numbers from the keyboard until the user enters a null input.

1 INPUT;C$: IF C$="" THEN END
2 OPEN "CREDO" FOR INPUT AS 1: PRINT":";
3 IF EOF(1) THEN CLOSE: PRINT: GOTO 1
4 LINE INPUT#1,A$: B$=C$
5 IF A$=""THEN A%=-1 ELSE A%=ASC(A$)-48: A$=MID$(A$,2)
6 IF B$=""THEN B%=-1 ELSE B%=ASC(B$)-48: B$=MID$(B$,2)
7 IF A%<0 THEN PRINT" ";A$;: GOTO 3
8 IF A%=0 OR B%=0 OR A%=B% THEN 5 ELSE 3

The program makes no attempt to separate its code into input, processing and output. Even in the 1960s when BASIC was invented (Kemeny & Kurtz, 1964) this was recognised to be a bad thing. It makes a puzzle out of something which was originally intended to be easy to read and understand, viz. BASIC itself.

Now there's little point in translating a puzzle written in BASIC into the same puzzle written in J. Far better to purge the algorithm of its BASIC idiosyncracies, and express it in a way which makes it convenient to program "intelligently" in J. By which we mean in a way which shows some grasp of what the original program did, and how it did it, making it possible to choose "the J way" of doing things as opposed to "the BASIC way".

We won't strive for the most efficient J code, preferring code which clarifies the algorithm, whilst leaving the way open to tighten it up at our leisure (though maybe at the cost of clarity). But when we've finished, there will be little or no sign of the BASIC origins of the program.

Addressed in this way, Bo's sample program makes an appealing case-study for re-implementing an application in J.

The traditional input/processing/output cycle

BASIC programs were originally run on a shared mainframe and accessed by the user via a teletype. If the flow-of control drops off the end, the program stops and so does your session. You'd probably have to phone the computer operator to mount the tape again and schedule a new job for you, charging your department accordingly. You don't want that to happen. So if the program is meant to take a number as input and generate a Latin sentence as output, you write it to cycle indefinitely, accepting input until you call a halt, conventionally done by entering an empty line.

This is the purpose of line 1:

1 INPUT;C$: IF C$="" THEN END

The computer puts the characters you enter into a variable called C$. (The final $ denotes a string variable). If C$ is empty, the computer terminates the session by executing END.

If C$ is not empty, the computer executes line 2, which, we may guess, starts reading the data file "CREDO" from the beginning. It also prints a colon in preparation for eventual output. To save paper this is going to be on the same line as your input. This is how teleprinters were used before computers came along.

And it's not hard to guess the next line (3) specifies what's to happen when the end-of-file (EOF) of "CREDO" is reached. The file is closed, PRINT is executed to start a new line, and flow-of-control goes back to the start of the program (line 1) to ask you for your next number. But a lot happens before that takes place. Typically the flow-of-control drops through to line 4 without doing anything, and line 4 starts the real work by reading the next record from the data file.

Let's represent this behaviour with a J verb like this:

go=: 3 : 0
while. 1 do.
  if. 0=#C=: 1!:1 (1) do. '...exits' return. end.
  smoutput pray C
end.
)

The heart of the app is the verb pray, which accepts a number and returns a string (the Latin sentence). Nowadays you have exclusive use of your computer and your session is not going to stop when execution stops. So, writing an app for your own use you wouldn't bother to write a verb like go. You'd run pray directly like this:

   pray 21
CONFITEOR BAPTISMA AMEN
   pray 22
CONFITEOR AMEN
   pray 31
EXPECTO RESURRECTIONEM AMEN

The "processing" part of the program

What does the BASIC program actually do?

It reads the file "CREDO" in sequence, record by record. Each record consists of an ID (a number) and a Latin word. It matches the ID with the number you typed-in (let's call it the key), and if they match it prints the word. Otherwise it goes on to the next word. That's all it does.

How would you naturally do it in J?

Suppose you had a dyad verb called match. The x-arg is a given ID, the y-arg is the key, and it returns 1 if they match, 0 if they don't. If you gave this verb Rank 1, it would automatically generalise to accepting a list of IDs, and the result would be a list of 0s and 1s, one for each corresponding word. You could then use Copy (#) with this Boolean vector to extract those words you want to include in the sentence from the list of all words in the data file. In this way you'd separate processing from output.

Firstly, let's split the data in CREDO into a list of IDs and a corresponding list of words:

CS=: > CREDO -.each <' 0123456789'			NB. words only
CN=: > CREDO -.each <'ABCDEFGHIJKLMNOPQRSTUVWXYZÆ'	NB. keys only
   CN ; CS
┌──────────┬────────────────┐
│1         │CREDO           │
│11        │IN              │
│111       │UNUM            │
│11        │DEUM            │
...        ...            ...
│39        │ET              │
│32        │VITAM           │
│3211      │VENTURI         │
│321       │SÆCULI         │
│          │AMEN            │
└──────────┴────────────────┘

Now we can define our verb: pray like this:

   sel=: 3 : 'CN match"1 ({:$CN){.":y'
   sel 11
1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...

   pray=: 3 : 'deb , CS #~ sel y'

Verb: sel is basically just CN match y -- but this assumes match has rank 1, and y is in character form and is the same width as CN. The full definition, as shown above, ensures all of these things.

Verb: pray is basically just (sel y)#CS -- but that's a matrix having the same number of columns as CS. The full definition, as shown above, turns it into a string of words and removes duplicated spaces.

Intuiting and verifying the ID "match" algorithm

All that remains is to define our verb: match.

This is not so easy. We can guess that lines 4 to 8 of the BASIC program implements the matching algorithm, and it does a digit-by-digit comparison. But it may work by side-effects, so here's a case where we'll have to attempt a faithful transliteration of the BASIC code into J. Then, using this to verify what the algorithm actually does, we can write an alternative definition of match in efficient J code.

We'll call these two versions of match: matchX and matchY, and plug them in turn into: sel using one or other of the sentences:

   match=: matchX
   match=: matchY

Here's the attempted transliteration...

matchX=: 4 : 0
	NB. test database line x against searchkey y
	NB. a transliteration of original algorithm
NB. 1 INPUT;C$: IF C$="" THEN END
NB. 2 OPEN "CREDO" FOR INPUT AS 1: PRINT":";
NB. 3 IF EOF(1) THEN CLOSE: PRINT: GOTO 1
	NB. ...implemented by word: go (below)

NB. 4 LINE INPUT#1,A$: B$=C$
As=. dtb x	NB. A$ - word read-in from data base
Bs=. dtb y	NB. B$ - test key

while. 1 do.

NB. 5 IF A$=""THEN A%=-1 ELSE A%=ASC(A$)-48: A$=MID$(A$,2)
if. As-:'' do.
  An=._1	NB. An is original A%
else.
  An=. ".{.As	NB. ASC(A$) only translates the first char of A$
  As=. }.As	NB. MID$() without 3rd arg is like RIGHT$()
end.

NB. 6 IF B$=""THEN B%=-1 ELSE B%=ASC(B$)-48: B$=MID$(B$,2)
if. Bs-:'' do.
  Bn=._1	NB. Bn is original B%
else.
  Bn=. ".{.Bs
  Bs=. }.Bs
end.

NB. 7 IF A%<0 THEN PRINT" ";A$;: GOTO 3
if. An<0 do. 1 return. end.	NB. i.e. accept the word

NB. 8 IF A%=0 OR B%=0 OR A%=B% THEN 5 ELSE 3
if. (An=0) or (Bn=0) or (An=Bn) do.
  NB. (no-op) loop to test next letter
else.
  0 return.	NB. i.e. reject the word
end.
end.
)

and here's an equivalent version in somewhat more efficient J:

matchY=: [: *./ = +. ('0' = [) +. ('0' = ]) +. ' ' = [

An "explication" of matchY (an equivalent explicit definition) is as follows:

ZE=: '0'
SP=: ' '
or=: +.
all=: *./
matchY=: 4 : 'all ((x=y) or (x=ZE) or (y=ZE) or (x=SP))'

The tacit form of matchY can probably be made more efficient still with a better use of forks. We leave it as it is here, because it corresponds to what you get if you use: 13 : in place of: 4 : in the above explication.

Conclusion

Here's a script combining all the things discussed:

File:Credo.ijs


(Contributed by Ian Clark.)