NYCJUG/2024-03-12

From J Wiki
Jump to navigation Jump to search

Beginner's regatta

We first look at the relation between large numbers expressed in scientific notation and how to avoid this if we want precise integer values, then we review type specifiers used with J's direct definition syntax.

Scientific Notation Vs Exact Representation

We received this message on the J Forums the other day:

From: Pawel Jakubas <jakubas.pawel@gmail.com>
Date: Fri, Jan 19, 2024 at 6:30 AM
Subject: [Jforum] convert from/to scientific notation
To: forum <forum@jsoftware.com>

Dear J enthusiasts,

Let's say I want to calculate
   5 * 5 * 7 * 6 * 6 * 8 * 9  * 6 * 6 *4 * 8 * 9 * 5
2.35146e10 
How to get exact result? Also how to convert any number to scientific conversion?

Best regards,
Pawel Jakubas

So, Pawel wants to be able to switch between scientific and exact representation. In J, we use extended integers (with x: or nx).

We had these replies (combined) from Pablo Landherr:

   5 * 5 * 7 * 6 * 6 * 8 * 9 * 6 * 6 *4 * 8 * 9 * 5x
23514624000

   x: */5 5 7 6 6 8 9 6 6 4 8 9 5
23514624000
--

Missed the other question:
   10j_4 ": */5 5 7 6 6 8 9 6 6 4 8 9 5
2.3515e10

Notice that this example of converting to scientific notation also changes the type of the result to a character representation of the number.

More General Use of Extended Precision

Ewart Shaw joined the discussion to point out that the initial extended precision result above does not generalize and to offer a more general formulation.

Unfortunately,   x: */5 5 7 6 6 8 9 6 6 4 8 9 5   won't work more generally:
    p: i. 16
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53

    */ x: p: i. 16   NB. correct
32589158477190044730

    x: */ p: i. 16
32589158477190045696

    */ p: i. 16
3.25892e19

Subsequently "LdBeth" joined the conversation with this suggestion:

Depends on how you want J to work it,
x: */ p: i. 16 converts the result float to nearest exact number,
*&x:/ p: i. 16 also computes the exact 32589158477190044730.

It's worth noting that the first of these two loses precision because it converts a floating point number, with its attendant loss of precision, to extended representation. However, the latter method, using the hook *&x: to multiply and extend precision at the same time, has a similar limitation:

   #'31415926535897932384626438'
26
   */   2 33 555 7777 11111111 31415926535897932384626438x
99438914898169876677592245956855980113180
   *&x:/2 33 555 7777 11111111 31415926535897932384626438
99438914898169871168290518248903808122880

Note that the latter product above differs after the 16th digit because the lower digits of the 26-digit number as entered are lost when the long number is converted to floating point in spite of the apparent precision with which it was entered.

Finally, David Alis offered this variant to achieve an extended precision result which works but relies on this product being on a generated sequence, not any arbitrary one.

  */ p: i. 16x
32589158477190044730

Direct Definition Specifiers

Recently we noticed that J's direct definition syntax - enclosing J phrases in {{ and }} - has an optional type specifier. For instance, using )n at the start of a definition, as seen here, gives us another way to define a multi-line text string.

   aa=. {{)n This 'string' contains numbers (1,2) and "quoted" text, as
well as an embedded line-feed.
}}
   $aa
91
   LF e. aa
1
   aa}.~aa i. LF         NB. Drop everything up to the first line-feed

well as an embedded line-feed.

The initial )n specifies that the text following it is to be treated as noun.

The older way to do this uses 0 : 0 like this:

   bb=. 0 : 0
 This 'string' contains numbers (1,2) and "quoted" text, as
well as an embedded line-feed.
)
   $bb
91
   LF (]}.~] i. [) bb

well as an embedded line-feed.

However, there are several other type specifiers for a direct definition. Each is designated by a single letter following an initial closing parenthesis as the first character in the definition. The currently supported designations are these.

Character Part of Speech Generated
m monadic verb
d dyadic verb
v verb with valence depending on the names used
a adverb
c conjunction
n noun. See below for details.
* (default) depends on the names used

Show-and-tell

Here we take a brief look at the next operation to be replicated from the Python Keras library for building a neural net in J. We made little progress implementing this but expect to have something soon. We also consider the efficiencies of different ways of reshaping a vector to be a two-row table.

Replicating keras.layers.Dense

Looking at this layer of a neural net written in Python

tf.keras.layers.Dense(256, activation=tf.nn.relu)

we learn that this operation Dense works like this:

Transforms data:

  • layers.Dense takes input data of a certain dimensionality and transforms it to a new dimensionality. You specify the desired output dimension using the units argument.
  • Applies activation function (optional): It performs a linear transformation on the data and then optionally applies an activation function. This activation function introduces non-linearity, allowing the network to learn complex patterns. Common activation functions include relu, sigmoid, etc. (specified by the activation argument).
  • Learns weights and biases: Internally, the layer creates a weight matrix and a bias vector. The weights determine the strength of the connections between neurons, and the bias adds a constant value to each neuron's output. These weights and biases are learned during the training process.

The process works like this:

  1. Input: The layer receives input data, typically a 2D tensor with a shape of (batch_size, input_features). Here, batch_size is the number of data samples processed together. input_features is the number of features in each data sample.
  2. Weight matrix and bias vector: The layer has a weight matrix of shape (input_features, units) and a bias vector of shape (units,). These are randomly initialized at the beginning and adjusted during training.
  3. Linear transformation: The layer performs a matrix multiplication between the input data and the weight matrix. This essentially combines the input features with their corresponding weights.
  4. Bias addition (optional): If use_bias is set to True (default), the bias vector is added to the result of the matrix multiplication.
  5. Activation function (optional): If an activation function is specified using the activation argument, it's applied element-wise to the output from step 4. This introduces non-linearity and helps the network learn complex relationships.
  6. Output: The final output of the layer is a tensor with a shape of (batch_size, units). This represents the transformed data after the linear transformation and (optional) activation.

Making a Table from a Vector

The J forum received this query recently:

From: David Pinchbeck <dpinchbe@gmail.com>
Date: Sun, Feb 25, 2024 at 11:25 AM
Subject: [Jforum] Reshape list of length 2n to a 2 by n table
To: forum <forum@jsoftware.com>

I want to split a list with even length into a 2 by n table:

    f=:{{(2,2%~#y) $ y}}

    f 'abcdef'
abc
def

Can I do it without computing the row length ahead of time? I feel like I stumbled upon a quick way to do this some months ago, but can''t remember.

Thanks,
David

There were a few responses but this one from Bob Therriault best summarizes the results and their relative efficiencies.

From: 'robert therriault' via forum <forum@jsoftware.com>
Date: Sun, Feb 25, 2024 at 12:19 PM
Subject: Re: [Jforum] Reshape list of length 2n to a 2 by n table
To: <forum@jsoftware.com>

Along the lines of your first solution Henry, you could use (]\~ -@-:@#)

   s0=:{{(2,2%~#y) $ y}}
   s1=: $~ (2 , -:@#)
   s2=: ]\~ -@-:@#
   a=:'abcdef'
   s0 a
abc
def
   s1 a
abc
def
   s2 a
abc
def
   10000 timespacex 's0 a'
7.261e_7 1600
   10000 timespacex 's1 a'
5.429e_7 1408
   10000 timespacex 's2 a'
4.961e_7 1280

Cheers, bob

Replicating this on my own machine gives the following:

   timespacex
6!:2 , 7!:2@]
   10000 timespacex 's0 a'
9.3776e_7 2112
   10000 timespacex 's1 a'
7.4346e_7 1664
   10000 timespacex 's2 a'
6.9972e_7 1536

The relative time and space figures align with the ones Bob showed but it looks like J on his machine runs noticeably faster than on my machine.

However, the relative ordering by CPU time of these solutions changes for a larger argument:

   a=. 1e4$'abcdef'
   10000 timespacex 's0 a'
1.05808e_6 18464
   10000 timespacex 's1 a'
5.3174e_7 1664
   10000 timespacex 's2 a'
5.0598e_7 1536

In this case, s0 is much faster than the other two though it does require more memory.

Derangements

This problem leads us to an interesting kind of permutation known as a derangement.

Interesting Probability Question

My colleague posed an interesting question at work today regarding probabilities, and I was wondering if you had seen something similar in the past and could assist.

You have a box with 5 differently shaped holes, and 5 objects that each correspond to a specific hole. If an object is put into a different shaped hole, the hole becomes blocked.

If you randomly chose an object, and then placed it into a random hole, what is the probability of having all 5 holes blocked?

My initial thought was a simple 4/5 * 3/4 * 2/3 * 1/2 as the amount of possibilities decreased, but my colleague says the answer is 11/30. My thought would be that it has something to do with looking back to check if a hole was already blocked or not.

Any thoughts?

A Hint for a Solution

The user "Surzh" responded with this:

Interesting question!

The reason why your initial thought doesn't work, is because the combinations of objects/holes matters. Suppose you had blocks and holes labeled 1 2 3 4 5. They fit if they're equal.

If in your first two moves you happen to put objects 1 and 2 in holes 4 and 5 (the order doesn't matter here, because it wont fit either way), you're left with objects 3 4 and 5, and holes 1 2 and 3.

On the other hand, if your first two moves happen to put object 1 into hole 2 and object 2 into hole 1, you're left with objects 3 4 and 5 and holes 3 4 and 5.

Do you see the difference between these situations?

Edit and spoiler: you may want to look at the wiki page for Derangements

"Derangment" on Wikipedia

Looking at the suggested Wikipedia page, we see this.

In combinatorial mathematics, a derangement is a permutation of the elements of a set in which no element appears in its original position. In other words, a derangement is a permutation that has no fixed points.

The number of derangements of a set of size n is known as the subfactorial of n or the n-th derangement number or n-th de Montmort number (after Pierre Remond de Montmort). Notations for subfactorials in common use include !n, Dn, dn, or n¡.

Stochastic Solution in J

A simple, brute force way to solve this in J is with a simulation, like this.

   holes=. objs=. i.5
   ([:?~[:#]) objs
4 0 3 2 1
   ([:?~[:#]) objs
1 2 4 0 3
   (]{~[:?~[:#]) 'ABCDE'
CABED
   (]{~[:?~[:#]) 'ABCDE'
ADECB
   mix=: (]{~[:?~[:#])
   holes~:mix objs
1 1 0 1 1

The 'mix' function randomly permutes its argument.

For an example where an object does not align to its hole:

   *./holes~:mix objs
1

An example where all the objects align.

   holes ([: *./ [ ~: [: mix ]) objs
0

Now that we can generate random combinations of "holes" and "objs" to fit or not fit into them, we can see how many blockages we get for a million samples.

   6!:2 'blocked=. (<holes) ([: *./ [ ~: [: mix ]) &>1e6$<objs'
0.398447
   +/blocked
366136

So this looks like the answer would be about 0.366 or 36.6% of the time we will be unable to fill all the holes with the objects. Trying again with ten million samples confirms this approximation.

   6!:2 'blocked=. (<holes) ([: *./ [ ~: [: mix ]) &>1e7$<objs'
4.11566
   +/blocked
3666669

So, 36.7% is our answer to 3 significant digits.

A Comprehensive Solution

Since this problem specifies only 5 objects, we could consider all 120 (=!5) arrangements to get a more exact answer to this problem.

   $(i. 120) A. i.5
120 5
   $~.(i. 120) A. i.5
120 5

We see above how we generate all !5 possible permutation of 5 objects using A. with the proper arguments.

Now we can count all possible blockages.

   0+/ . = +/"1 (i.5)="1 (i. 120) A. i.5
44
   44%120
0.366667

This confirms our solution to 3 digits as shown above.

A Deterministic Solution in J

We can also count the number of derangements exactly with this recursive function.

derangements=: 3 : 0"0
   assert. (y>_1)*.y=<.y
   if. y<4 do. y{1 0 1 2
   else. (y*derangements y-1)+_1^y end.
)

Here we see how many derangements are possible for 3 to 5 objects.

   derangements 3
2
   derangements 4
9
   derangements 5
44

We see that the answer for 5 objects is 44 as we determined above by exhaustive evaluation.

Looking at the number of derangements for 0 to 14 objects, we find this.

   derangements i. 15
1 0 1 2 9 44 265 1854 14833 133496 1.33496e6 1.46846e7 1.76215e8 2.29079e9 3.20711e10

To see the exact values

   derangements i. 15x
1 0 1 2 9 44 265 1854 14833 133496 1334961 14684570 176214841 2290792932 32071101049

A Brief Convergence

Somewhere we saw a question about the result of raising i (square root of negative one) to the i power i times. Of course i times does not really make sense but what do we get for successive iterations of raising i to the i?

Here's how we might do this.

   0j1^0j1
0.20788
   (0j1^])^:3]0j1
0.0500922j0.602117
   (0j1^])^:4]0j1
0.387166j0.0305271

We see that the value keeps changing. Does it converge?

   (0j1^]) ^: _ ] 0j1
0.438283j0.360592

It does. What do the successive values look like as it approaches convergence? We can plot them like this.

   (0j1^]) ^: (i.100) ] 0j1
0j1 0.20788 0.947159j0.320764 0.0500922j0.602117 0.387166j0.0305271 0.782276j0.544607 0.142562j0.400467 0.519786j0.118384 0.568589j0.605078 0.242365j0.301151 0.578489j0.23153 0.42734j0.548231 0.330967j0.262892 0.574271j0.328716 0.369948j0.468173 0.400633j0...
   load 'plot'
   plot (0j1^])^:(i.100)]0j1

IToThei 100Times.JPG

Advanced topics

Comparing Indexing in q Versus in J

We have recently been reading parts of an unreleased book on the q language and were struck by how similar indexing is in q to how it is in J. Here are some examples.

Q Session

q)z:(0 1 2 3;4 5 6 7;8 9 10 11)
q)z
0 1 2  3 
4 5 6  7 
8 9 10 11
q)z[1 2;0 1]
4 5
8 9

When z is indexed at depth with two two-item simple lists, the result is a two-item list that simulates the sub-matrix specified by the Cartesian product of the two simple lists

q)z . 1 2
6
q)z . (0 1;2 3) 
2 3
6 7
q)z .(1 2;::) 
4 5 6  7 
8 9 10 11
q)z .(::;1 2)
1 2 
5 6 
9 10

J Session

Here are some equivalents to the above in J:

  ]z=. >0 1 2 3;4 5 6 7;8 9 10 11
0 1  2  3
4 5  6  7
8 9 10 11
   z{~<1 2;0 1
4 5
8 9
   z{~<1 2
6
   z{~<0 1;2 3
2 3
6 7
   z{~1 2
4 5  6  7
8 9 10 11
   z{"1~1 2
1  2
5  6
9 10

Subtle Differences

Since the basic data structure of q is a list whereas it's an array in J, there are some subtle differences in results.

For instance, in q

q)z . enlist 0
0 1 2 3

whereas in J we have two variants of this selection with different shapes:

   z{~0
0 1 2 3
   z{~,0
0 1 2 3
   $z{~,0
1 4
   $z{~0
4

Q also has a number of way to accomplish the same thing here:

q)z 0
0 1 2 3
q)z @ 0
0 1 2 3

Learning and Teaching J

A recent thread on Reddit was started by a question about the rationale behind the widely accepted order of operations known as PEMDAS - for parenthesis, exponents, multiplication, division, addition, subtraction as the order in which a mathematical expression should be evaluated. There appeared to be a fair amount of confabulation about why this is the natural order of doing things but some responders noted its arbitrariness.

Order of Operations

The "unusual" order of operations J uses, as does APL and q, has a number of good justifications which were laid out by Ken Iverson quite some time ago. It's worth remembering what these arguments are.

The reasons for choosing a right-to-left instead of a left-to-right convention are:

1. The usual mathematical convention of placing a monadic function to the left of its argument leads to a right-to-left execution for monadic functions; for example, F G x ≡ F (G x) .

2. The notation F/z for reduction (by any dyadic function F) tends to require fewer parentheses with a right-to-left convention. For example, expressions such as +/(x×y) or +/(u/x) tend to occur more frequently than (+/x)×y and (+/u)/x .

3. An expression evaluated from right to left is the easiest to read from left to right. For example, the expression

    a+x×b+x×c+x×d+x×e+x×f 

(for the efficient evaluation of a polynomial) is read as a plus the entire expression following, or as a plus x times the following expression, or as a plus x times b plus the following expression, and so on.

4. In the definition

FreducingxEG.png

the right-to-left convention leads to a more useful definition for nonassociative functions F than does the left-to-right convention. For example, -/x denotes the alternating sum of the components of x , whereas in a left-to-right convention it would denote the first component minus the sum of the remaining components. Thus if d is the vector of decimal digits representing the number n , then the value of the expression 0=9|+/d determines the divisibility of n by 9 ; in the right-to-left convention, the similar expression 0=11|-/d determines divisibility by 11 .

This originally appeared as Appendix A of K.E. Iverson's Elementary Functions: An Algorithmic Treatment, Science Research Associates, 1966.

A Graph Data Type?

The essay "The Hunt for the Missing Data Type" makes the argument that graphs are ubiquitous in software but are difficult to deal with using existing software packages, even those specifically geared toward handling graphs.

Different J users have done work on using graphs - see these functions, this essay on computing the shortest path between vertices, and [https://www.jsoftware.com/help/dictionary/samp20.htm this essay on graphs in general.

How to Represent a Graph?

After interviewing a number of people who have written software for working with graphs, the author identifies these difficulties with doing a good job in this area.

One of these problems is that there are too many design choices, in part because of the different kinds of graphs, like "directed" and "undirected", complicate the sort of answers we might expect from a graph query.

A related problem is that there are too many implementation choices for encoding a graph. He lists four basic ways a programming language might store a particular graph; in this case we are considering a simple graph.

SampleGraph.png

Name Example
Edge list [[a, b], [b, c], [c, a], [c, b]]
Adjacency list [[b], [c], [a, b]]
Adjacency matrix [0 1 0; 0 0 1; 1 1 0]
Multiple structures A set of three structs with references to each other

Each of these encodings comes with its own strengths and weaknesses. Some are better at dense graphs, some are better for sparse ones.

Related to these issues is performance. Since many graph algorithms are NP-complete, the work necessary to deal with a large graph can be an enormous problem.

The essay is worth a read for an outline of the difficulties in dealing with graphs in software.