From J Wiki
Jump to navigation Jump to search

Beginner's regatta

Adverb to Apply a Verb Across a Large File

from:	Joe Bogner <>
to:	Robert Bernecky <>
date:	Tue, Feb 3, 2015 at 3:41 PM
subject:Re: [Jchat] quickest looping construct

I'm scanning a 240MB text file and generating 9 character infixes and processing them. I hit the memory limit using 9]\ so I figured I'd try looping and using memr (15!:1). In an earlier version, about 270 seconds were spent in the looping construct and 40 seconds in memr. It seemed like there should be a faster looping construct. Here's a simple version:

txt=: a. {~ (97&+(i. 26))
addr=: mema (# txt)
txt memw addr,0,(#txt)
6!:2 '([:<:([:0&{::(] ; smoutput&([:15!:1(addr,9,~<:@:]))    ))) ^:] (<:#txt)'


I would need more processing of the results, but you can see roughly how it compares to 9]\



Here’s a simplified version of the “doSomething” adverb on which I’ve been working to apply a verb across a file in manageable pieces:

doSomethingSimple=: 1 : 0
   'curptr chsz max flnm passedOn'=. 5{.y
   if. curptr>:max do. ch=. curptr;chsz;max;flnm
   else. ch=. readChunk curptr;chsz;max;flnm
       passedOn=. u (_1{ch),<passedOn  NB. Allow u's work to be passed on to next invocation
NB.EG ([:~.;) doSomethingSimple ^:_ ] 0x;1e6;(fsize 'bigFile.txt');'bigFile.txt';<'' NB. Return unique characters in file.

Some of the heavy lifting is done by readChunk which keeps track of the pointer to the current location and adjusts the chunk size downward to avoid attempting to read past the end of the file.

3 : 0
   'curptr chsz max flnm'=. 4{.y
   if. 0<chsz2=. chsz<.0>.max-curptr do. chunk=. fread flnm;curptr,chsz2
   else. chunk=. '' end.
NB.EG chunk=. >_1{ch0=. readChunk 0;1e6;(fsize 'bigFile.txt');'bigFile.txt'

I created a large test file like this, in pieces, because it is too large to create in a single write.

   bigfl=. 'BigFile.txt'
   6!:2 '''AGCT'' (4 : ''y[(x{~20e6?@$#x) fappend y'')^:12 ] bigfl'
   fsize bigfl

Here we see an example of using a simple verb across the file in ten million byte chunks:

   ([:~.;) doSomethingSimple ^:_ ] 0x;1e7;(fsize bigfl);bigfl;<'' NB. Return unique characters in file.
   6!:2 '([:~.;) doSomethingSimple ^:_ ] 0x;1e7;(fsize bigfl);bigfl;<'''''

One problem I encountered while working with large files like this:

Prefix9FlChunk=: 3 : '~.(>1{y),9[\>0{y'  NB. Running nub of 9 prefixes...
   6!:2 'ch9=. Prefix9FlChunk doSomethingSimple ^:_ ] 0x;1e7;(fsize bigfl);bigfl;<'''' NB. Return unique characters in file.'
|out of memory: Prefix9FlChunk
|       ~.(>1{y),9[\>0{y
|10000000|262145 9|

Process shell<1> exited abnormally with code 5

I had to re-train my fingers to enter 13!:19 (to cut back the stack first) instead of 13!:0]0 to avoid this fatal error. Eventually I got this (almost) correct:

   Prefix9FlChunk=: 3 : '~.(>1{y),~.9[\>0{y'  NB. Running nub of 9 prefixes...
   6!:2 'ch9=. Prefix9FlChunk doSomethingSimple ^:_ ] 0x;1e7;(fsize bigfl);bigfl;<0 9$''''''
||||11|262144 9|

Further Examples on a Large Tab-delimited File

Here is a more extensive look at this adverb, including examples of using it. The essay provides an example of using the code to work with large files on a tab-delimited file with column headers as the initial row. In this case, we assume that some combination of columns (Date and security ID) comprises a key for which there should be no duplicate entries.


* Spoiler Alert * Project Euler Problems

Danyel Lawson has some Project Euler solutions in both J and Python on github.

We covered problem 85, Counting All Rectangles, at an earlier meeting. This includes a Python as well as a J solution.

For J solutions to the first six Project Euler problems, take a look at this talk from last year. The six one-line J solutions are about 3/4s of the way along in the talk, as examples in a brag that I solved the first ten of these in less than an hour because of the power of J.

Project Euler Problem # 1

Running his J session in a Jupyter notebook for the first problem looked like this:

 In [2]:   %alias ijc ijconsole
 In [4]:   !!ijconsole -js a=.2 b=.3 "echo a*b" "exit''"
Out [4]:   ['6']
 In [3]:   ijc pe1.ijs
 In [5]:   %recall
 In [16]:  ['6']
Out [4]:   ['6']
 In [ ]:   ijc pe2.ijs
 In [31]:  !top -n1 -c1

=top - 00:25:31 up 143 days, 29 min,  0 users,  load average: 0.35, 0.55, 0.41
Tasks:  77 total,   2 running,  75 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.1 us,  2.4 sy,  0.0 ni, 95.0 id,  0.0 wa,  0.0 hi,  0.4 si,  0.0 st
KiB Mem:   1017888 total,   617448 used,   400440 free,     8284 buffers
KiB Swap:        0 total,        0 used,        0 free.    39480 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                 
24444 freegnu   20   0  855884 104864    748 S 23.8 10.3   2:05.11 /usr/bin/python -c from IPython.kernel.zmq.kernelapp i+ 
 1919 freegnu   20   0  470372  75036   2940 S  5.9  7.4   7507:21 /usr/bin/python /usr/bin/ipython notebook --profile no+ 
26146 freegnu   20   0  737512  39292    592 S  5.9  3.9  75:05.95 /usr/bin/python /usr/bin/ipython console --existing ke+ 
    1 root      20   0   33504   1432     88 S  0.0  0.1   2:49.21 /sbin/init                                              
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.00 [kthreadd]                                              
    3 root      20   0       0      0      0 S  0.0  0.0   4:58.82 [ksoftirqd/0]                                           
    5 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 [kworker/0:0H]                                          
    7 root      20   0       0      0      0 R  0.0  0.0 681:16.33 [rcu_sched]                                             
    8 root      20   0       0      0      0 S  0.0  0.0 400:54.54 [rcuos/0]                                               
    9 root      20   0       0      0      0 S  0.0  0.0   0:00.00 [rcu_bh]                                                
   10 root      20   0       0      0      0 S  0.0  0.0   0:00.00 [rcuob/0]                                               
   11 root      rt   0       0      0      0 S  0.0  0.0   0:00.00 [migration/0]                                           
   12 root      rt   0       0      0      0 S  0.0  0.0   0:38.38 [watchdog/0]                                            
   13 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 [khelper]                                               
   14 root      20   0       0      0      0 S  0.0  0.0   0:00.00 [kdevtmpfs]                                             
   15 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 [netns]                                                 
   16 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 [writeback]                                             
   17 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 [kintegrityd]                                           
   18 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 [bioset]                                

The J script for this first problem is pe1.ijs, as shown here:

NB. Project Euler 1
NB. If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
NB. Find the sum of all the multiples of 3 or 5 below 1000.
dby =: 0&=@|~
dby35 =: #~ (dby&3 +. dby&5)
smoutput sdby35 =: +/ dby35 i.1000

It may not be clear from the run above, but the answer returned was 233168.

Project Euler Problem # 2

We won't give the final answers here, just three versions of the solution in J, Python, and Java.

Here is the problem statement and a J solution:

NB. Project Euler 2
NB. Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:
NB. 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
NB. By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.
fib_even =: 3 : 0
fib =. i. +/ .! i.@-
f =. >:
v =. y&>
t =. v@fib
fibs =. f^:t^:a:
terms =. fib "0 }: fibs 2
+/ (-. 2|terms) # terms
smoutput fib_even 4e6

Here is a Python solution to the same problem:

def fib(n):
 fib.cache = getattr(fib,'cache',{})
 if n not in fib.cache:
  if n in (0,1):
   fib.cache[n] = 1
   fib.cache[n] = fib(n-1)+fib(n-2)
 return fib.cache[n]
if __name__== '__main__':
 sum = 0
 i = 1
 last = fib(i)
 while last <= 4000000:
  if last % 2 == 0:
   sum += last
  i += 1
  last = fib(i)
 print sum, '\n'

Here is a Java solution:

public class Fibonacci {
    public static void main (String[] args) throws Exception {
	long i = 1;
	long sum = 0;
	long last = fib(i);
	while (last < 4000000) {
	    if (last % 2 == 0) {
		sum += last;
	    last = fib(++i);
    public static long fib (long n) {
	long r = 0;
	if (n == 0 || n == 1) {
	    r = 1;
	else {
	    r = fib(n-1) + fib(n-2);
	return r;

This is the only one solved in multiple languages, so we will leave any other comparisons to the reader.

7-27 Simulation Re-visited

At an earlier NYCJUG meeting, we looked at putting together a simple simulation to play a card game called “7-27”: .

Recently, some traffic on the J-Forum about problems with plotting histograms led me to revisit this simulation as I’ve had a long-standing problem with axis-alignment on histograms that is quite evident in one that I use in displaying the results of this simulation.

To re-iterate from the earlier essay, the rules of the game are as follows:

  1. two cards are dealt face-down initially.
  2. face cards are half a point,
  3. aces are one or eleven points (at the player's discretion),
  4. tens are zero or ten points (at the player's discretion),
  5. all other cards are "face value" points, e.g. a five is five points.
  6. Two aces and a five constitute a special hand that wins the whole pot (as it is both exactly 7 and 27 at the same time).
  7. Each round, a player can decide to draw another card or not;
  8. at the end of a drawing round, there is a betting round.
  9. Once there is a drawing round where no player draws a card, the deck is closed and there is a final round of betting.

So, a player’s initial hand might be “10 K” which can be scored as 0.5 or 10.5. In the former case, the player can draw to get closer to 7 but, in the latter case, the player is already too far past 7 and must strive to get closer to 27. Note that the player does not have to decide which score to use initially – she can draw one or more cards and decide which target to aim for based on what else she gets. The value of a simulation like this can be to figure out how initial hands are likely to be distributed and how this distribution might change as players draw cards (or decline to draw). In our first cut at this problem, we defined the deck to be this:

   $deck=. (0.5#~*/3 4),4#>:i.10   NB. Init aces and tens at face value

That is, the deck consists simply of the base point value of each card. Our preliminary analysis of initial hands, in which we tallied the results of 10,000 5-player 2-card initial draws, was incomplete because it only used the “base” values of the two variable-value cards: the ace, which can be 1 or 11 points, and the ten, which can be 0 or 10 points. We did subsequently develop a more complete scoring algorithm, shown here:

scoreHand=: 3 : '~.+/&>,{(y=1)}((y=10)}(<"0 y),:<0 10),:<1 11'

We tested the correctness of this with some arbitrary test cases:

*./(2 12 22;0 10 20;1 11 21;(,5);2 12 22 32;3 13 23 33;0 10 20 30)-:&>/:~&.>scoreHand &.>1 1;10 10;1 10;3 2;1 1 10;1 1 1;10 10 10

These same test cases, spelled-out:

scoreHand_testCases_=: 3 : 0
   scoreHand=: scoreHand_base_
   assert. 2 12 22 -: scoreHand 1 1       NB. Two aces
   assert. 0 10 20 -: scoreHand 10 10     NB. Two tens
   assert. 1 11 21 -: scoreHand 1 10      NB. An ace and a ten
   assert. (,5) -: scoreHand 3 2          NB. Non-ambiguous cards
   assert. 2 12 22 32 -: scoreHand 1 1 10 NB. Two aces and a ten
   assert. 3 13 23 33 -: scoreHand 1 1 1  NB. Three aces
   assert. 0 10 20 30 -: scoreHand 10 10 10 NB. Three tens

We generate 10,000 5-player hands and score them like this:

   h0=. (<deck) init727Hands &.> 10000$5
   $sums=. ;scoreHand &.> ;h0
   usus sums          NB. USual Stats: min, max, mean, SD
0 22 8.93264 5.19585

We generate the histogram of these sums.

   BKTS=: i.23 [ PCT=: 0   NB. plotHistoMulti: fixed buckets, not %s.
   OTHERPLOTARGS=: 'color yellow'
   ss=. '(All) Sums of Initial Hands for 10K 5-player 7-27 Deals' plotHistoMulti sums


Advanced topics

Combining Mapped-file DB with Jtable Spreadsheet

to:	Digest recipients <>
date:	Sat, Jan 31, 2015 at 2:01 AM
subject: [jtable]
"Björn Helgason" <>: Jan 30 12:20PM 

By combining the mapped file database with the jtable speadsheet you have a very powerful tool.

You do a selection based on some criteria and create some functions you can choose from and then you select what fields to show and use jtable to do the presentation.

It is then easy to take actions based on what the user chooses to do on the results in jtable.

jtable is the missing piece to combine the power of the J functionality with the mapped database size and speed.

jtable offers so much more than earlier grids we have experimented with over the years and is a lot easier to extend and use.

Add some rules on what the user can do to operate on the data once presented and create a few verbs to act on it.

The mapped database has been there waiting for a long time and is described in the labs.

It has never been easier to create very attractive and powerful application.

All the ingredients are present.

Not to mention all other options being a part of what the browsers can offer with JHS.

Extracting Digits of Pi

Here are some references of work that has been done on calculating π.


Digit extraction methods

The Bailey–Borwein–Plouffe formula (BBP) for calculating π was discovered in 1995 by Simon Plouffe. Using base 16 math, the formula can compute any particular digit of π—returning the hexadecimal value of the digit—without having to compute the intervening digits (digit extraction). [43]


In 1996, Simon Plouffe derived an algorithm to extract the nth decimal digit of π (using base 10 math to extract a base 10 digit), and which can do so with an improved speed of O(n3log(n)3) time. The algorithm requires virtually no memory for the storage of an array or matrix so the one-millionth digit of π can be computed using a pocket calculator.[44]


The calculation speed of Plouffe's formula was improved to O(n2) by Fabrice Bellard, who derived an alternative formula (albeit only in base 2 math) for computing π.[45]


Efficient methods

In 1961 the first expansion of π to 100,000 decimal places was computed by Maryland mathematician Dr. Daniel Shanks and his team at the United States Naval Research Laboratory (N.R.L.). Daniel Shanks and his team used two different power series for calculating the digits of π. For one it was known that any error would produce a value slightly high, and for the other, it was known that any error would produce a value slightly low. And hence, as long as the two series produced the same digits, there was a very high confidence that they were correct. The first 100,000 digits of π were published by the Naval Research Laboratory. None of the formulæ given above can serve as an efficient way of approximating π. For fast calculations, one may use a formula such as Machin's:


together with the Taylor series expansion of the function arctan(x). This formula is most easily verified using polar coordinates of complex numbers, starting with,


Formulae of this kind are known as Machin-like formulae. (Note also that { x,y} = {239, 132} is a solution to the Pell equation x2-2y2 = -1.)

Many other expressions for π were developed and published by Indian mathematician Srinivasa Ramanujan. He worked with mathematician Godfrey Harold Hardy in England for a number of years. Extremely long decimal expansions of π are typically computed with the Gauss–Legendre algorithm and Borwein's algorithm; the Salamin–Brent algorithm which was invented in 1976 has also been used. The first one million digits of π and 1/π are available from Project Gutenberg (see external links below). The record as of December 2002 by Yasumasa Kanada of Tokyo University stood at 1,241,100,000,000 digits, which were computed in September 2002 on a 64-node Hitachi supercomputer with 1 terabyte of main memory, which carries out 2 trillion operations per second, nearly twice as many as the computer used for the previous record (206 billion digits). The following Machin-like formulæ were used for this:


K. Takano (1982).


F. C. W. Störmer (1896). These approximations have so many digits that they are no longer of any practical use, except for testing new supercomputers. (Normality of π will always depend on the infinite string of digits on the end, not on any finite computation.) In 1997, David H. Bailey, Peter Borwein and Simon Plouffe published a paper (Bailey, 1997) on a new formula for π as an infinite series:


This formula permits one to fairly readily compute the kth binary or hexadecimal digit of π, without having to compute the preceding k − 1 digits. Bailey's website contains the derivation as well as implementations in various programming languages. The PiHex project computed 64-bits around the quadrillionth bit of π (which turns out to be 0). Fabrice Bellard further improved on BBP with his formula[3]:


Other formulae that have been used to compute estimates of π include:




Srinivasa Ramanujan. This converges extraordinarily rapidly. Ramanujan's work is the basis for the fastest algorithms used, as of the turn of the millennium, to calculate π.


David Chudnovsky and Gregory Chudnovsky.

J Implementation of Chudnovsky

Roger Hui put together a J implementation of this Chudnovsky formula by re-writing it to have only rational terms, then using J's rationals to solve to as many digits as can be accomodated.

pica=: 3 : 0
 k  =. x: i. y
 top=. (!6*k) * 13591409+545140134*k
 bot=. (!3*k) * ((!k)^3) * 640320^3*k
 rt =. -:@(+640320&%)^:(>.2^.1+y) x:%:640320   NB. sqrt 640320 to 14*y digits
 % (12 % 640320*rt) * -/ top % bot

We can see that this gives us many digits very quickly:

   6!:2 'pip10=. 0j200 ": pica 10'
   6!:2 'pip11=. 0j200 ": pica 11'
   0 i.~pip10=pip11

This shows us that pip10 is good to 143 digits. Carrying out the same sequence for increasingly high numbers shows that we get about another 14 digits of precision for each higher value of the argument to pica.

   6!:2 'pip12=. 0j200 ": pica 12'
   0 i.~pip11=pip12
   6!:2 'pip13=. 0j200 ": pica 13'
   0 i.~pip12=pip13
   6!:2 'pip15=. 0j200 ": pica 15'
   6!:2 'pip16=. 0j200 ": pica 16'
   0 i.~pip15=pip16


Pi Hex

Pi Hex was a project to compute three specific binary digits of π using a distributed network of several hundred computers. In 2000, after two years, the project finished computing the five trillionth (5*1012), the forty trillionth, and the quadrillionth (1015) bits. All three of them turned out to be 0.

Software for calculating π

Over the years, several programs have been written for calculating π to many digits on personal computers.

General purpose

Most computer algebra systems can calculate π and other common mathematical constants to any desired precision. Functions for calculating π are also included in many general libraries for arbitrary-precision arithmetic, for instance CLN and MPFR.

Special purpose

Programs designed for calculating π may have better performance than general-purpose mathematical software. They typically implement checkpointing and efficient disk swapping to facilitate extremely long-running and memory-expensive computations.
• y-cruncher by Alexander Yee[46] is the program which Shigeru Kondo used to compute the current world record number of digits. y-cruncher can also be used to calculate other constants and holds world records for several of them.
• PiFast by Xavier Gourdon was the fastest program for Microsoft Windows in 2003. According to its author, it can compute one million digits in 3.5 seconds on a 2.4 GHz Pentium 4.[47] PiFast can also compute other irrational numbers like e and √2. It can also work at lesser efficiency with very little memory (down to a few tens of megabytes to compute well over a billion (109) digits). This tool is a popular benchmark in theoverclocking community. PiFast 4.4 is available from Stu's Pi page. PiFast 4.3 is available from Gourdon's page.
• QuickPi by Steve Pagliarulo for Windows is faster than PiFast for runs of under 400 million digits. Version 4.5 is available on Stu's Pi Page below. Like PiFast, QuickPi can also compute other irrational numbers like e, √2, and √3. The software may be obtained from the Pi-Hacks Yahoo! forum, or from Stu's Pi page.
• Super PI by Kanada Laboratory[48] in the University of Tokyo is the program for Microsoft Windows for runs from 16,000 to 33,550,000 digits. It can compute one million digits in 40 minutes, two million digits in 90 minutes and four million digits in 220 minutes on a Pentium 90 MHz. Super PI version 1.1 is available from Super PI 1.1 page.
• apfloat provides a Pi Calculator Applet for computing π in a browser. It can compute a million digits of π in a few seconds on a normal PC. Different radixes and algorithms can be used. In theory it can compute more than 1015 digits of π.

Learning and Teaching J

The RedMonk Programming Language Rankings: January 2015

With two quarters having passed since our last snapshot, it’s time to update our programming language rankings. Since Drew Conway and John Myles White originally performed this analysis late in 2010, we have been regularly comparing the relative performance of programming languages on GitHub and Stack Overflow. The idea is not to offer a statistically valid representation of current usage, but rather to correlate language discussion (Stack Overflow) and usage (GitHub) in an effort to extract insights into potential future adoption trends.

In general, the process has changed little over the years. With the exception of GitHub’s decision to no longer provide language rankings on its Explore page – they are now calculated from the GitHub archive – the rankings are performed in the same manner, meaning that we can compare rankings from run to run, and year to year, with confidence.

This is brought up because one result in particular, described below, is very unusual. But in the meantime, it’s worth noting that the steady decline in correlation between rankings on GitHub and Stack Overflow observed over the last several iterations of this exercise has been arrested, at least for one quarter. After dropping from its historical 0.78 – 0.8 correlation to 0.74 during the Q314 rankings, the correlation between the two properties is back up to 0.76. It will be interesting to observe whether this is a temporary reprieve, or if the lack of correlation itself was the anomaly.

For the time being, however, the focus will remain on the current rankings. Before we continue, please keep in mind the usual caveats.

  • • To be included in this analysis, a language must be observable within both GitHub and Stack Overflow.
  • • No claims are made here that these rankings are representative of general usage more broadly. They are nothing more or less than an examination of the correlation between two populations we believe to be predictive of future use, hence their value.
  • • There are many potential communities that could be surveyed for this analysis. GitHub and Stack Overflow are used here first because of their size and second because of their public exposure of the data necessary for the analysis. We encourage, however, interested parties to perform their own analyses using other sources.
  • • All numerical rankings should be taken with a grain of salt. We rank by numbers here strictly for the sake of interest. In general, the numerical ranking is substantially less relevant than the language’s tier or grouping. In many cases, one spot on the list is not distinguishable from the next. The separation between language tiers on the plot, however, is generally representative of substantial differences in relative popularity.
  • • GitHub language rankings are based on raw lines of code, which means that repositories written in a given language that include a greater number amount of code in a second language (e.g. JavaScript) will be read as the latter rather than the former.
  • • In addition, the further down the rankings one goes, the less data available to rank languages by. Beyond the top tiers of languages, depending on the snapshot, the amount of data to assess is minute, and the actual placement of languages becomes less reliable the further down the list one proceeds.

RedMonklang.rank .plot .q1152.png

[Where’s J?]

Materials and Random Notes

File:DoSomethingSimple.ijs Adverb to apply verb to large file

File:PlotHistogram.ijs Code for plotting a histogram

We talked about DLAs - dynamic linear algorithms - a book by Dan Buskirk(?). We talked about embeddings giving rise to periodic orbits and topological mixing; Kingsley Amis has written on rote learning; there was a quadratic equation known to the ancient Babylonians (pdf); someone recommended "Mathematics Made Difficult" by Carl Linderholm; there is a spigot algorithm for producing digits of pi.