NYCJUG/2009-04-14

study algorithms, adverb, quicksort, statistical graphics, YouTube

Location:: BEST, Hoboken, NJ

Agenda for NYCJUG of 20090414

             Meeting Agenda for NYC JUG 20090414
             -----------------------------------

1. Beginner's regatta: learning J and understanding classic algorithms - see
"Studying Algorithms to learn J.doc".


2. Show-and-tell: When do I use an adverb?   An example and a non-example -
see "Example Adverb and Not.doc"


3. Advanced topics: considerations for statistical graphics: see "Double-Y Axis
from Wainer.doc",  "Graphics Examples from Wilkinson.doc", "A Software Model
For Statistical Graphics.doc" (includes first part of "On Estimation And
Visualization Of Higher Dimensional Surfaces").


4. Learning, teaching and promoting J: YouTube videos - see
"http://www.youtube.com/watch?v=fDS-SPtwLS4" for Brian Schott's example of
downloading J and examples in  APL: (Palindromic Expression for Phi)
"http://www.youtube.com/watch?v=X3bv4Iu1aEg&feature=related",
(A Plea for Simplicity) "http://www.youtube.com/watch?v=qSVR4Z3DA24", and
(Game of Life) "http://www.youtube.com/watch?v=a9xAKttWgP4&feature=related".

Proceedings

We talked about the suitability of J for studying classic algorithms like quicksort. From this we segueed into a discussion of adverbs with some examples. This was followed by an exposition of some examples of writings on statistical graphics with an eye to what would be desirable in a package written in J. Finally, we looked at some of the use of YouTube for explaining and promoting J and APL.

Beginner's regatta

This discussion of the suitability of J for studying classic algorithms was provoked by the following exchange of e-mail on the J-Forum:

from	Yuvaraj Athur Raghuvir <yuvaraj.a.r@gmail.com>
to	Programming forum <programming@jsoftware.com>
date	Tue, Apr 7, 2009 at 3:25 PM
subject	[Jprogramming] Efficient Quick Sort?
	
Hello,

Inspired by the quicksort defined at http://jsoftware.com/help/dictionary/d212.htm as:
quicksortr=: (($:@(<#[) , (=#[) , $:@(>#[)) ({~ ?@#)) ^: (1<#) NB. random pivot

I wrote a selection sort as so:
selectsort =: ((<./),$:@(]-. <./))^: (1<#)

Yes, I know that grade(/:)  is the J way of doing sorting. I am just trying this out.

To eliminate the random pivot selection, I did the following:
quicksort=: (($:@(<#[) , (=#[) , $:@(>#[)) (0{])) ^: (1<#)

To test, I created these simple verbs:
cmd =: 'ts ''n1=.selectsort a''';'ts ''n0=.(/:a){a''';'a =. (10 ? 10) { 22+i. 10'
cmd =: ('ts ''n3=.quicksort a''';'ts ''n2=.quicksortr a'''),cmd
cmd =: ('n3-:n2';'n2-:n1';'n1-:n0'),cmd

test=: 4 : ',. > ". L:0 (x # ,. <y)' NB. Am I doing too much work here?

   b =.100000 test cmd
  (+/ % #) ;"1 (3 4 5 6 {"1  b)
3.01884e_5 3590.4 3.35261e_5 3589.82 2.14828e_5 4744.79 6.48059e_6 1384.96

The data shows that quicksort is faster than quicksortr but is slower than the selectsort on the average.

This is expected since the quicksort does a full scan to choose the items lesser/greater than the pivot element.

Is it possible to implement a quicksort faster than the selection sort that is proposed here?

Regards,
Yuva

p.s: Interestingly, I see that irrespective of the number of trials, I lose the first three match results. So, I have

  +/;3&{."1 b
299997
  +/;3&{."1 (100 test cmd)
297
  +/;3&{."1 (10 test cmd)
27

What could be wrong?
            +.--------------------+.

from	Yuvaraj Athur Raghuvir <yuvaraj.a.r@gmail.com>
to	Programming forum <programming@jsoftware.com>
date	Thu, Apr 9, 2009 at 11:53 AM
subject	Re: [Jprogramming] Efficient Quick Sort?
	
Hello,

Maybe I should add some more context on why I am doing this - I am (re-)reading up on algorithms and thought I shall use J to experiment. Am I doing the right thing by choosing J to implement and test these rather low level algorithms? Maybe for full control on memory and accesses, I should choose C++ instead.

Comments?

Regards,
Yuva

p.s: Not sure if this is programming or chat. For now I am continuing this thread.
            +.--------------------+.

from	Devon McCormick <devonmcc@gmail.com>
to	Programming forum <programming@jsoftware.com>
date	Thu, Apr 9, 2009 at 12:25 PM
subject	Re: [Jprogramming] Efficient Quick Sort?
	
Hi -

I know I didn't respond to your query because I'm personally not interested in doing low-level things, like implementing quicksort, in J.  It's a well-understood algorithm that's been implemented many thousands of times and does not speak to the strengths of J.  You're right to think it's more appropriate in a low-level language like C++.

I understand you're looking at this for pedagogical purposes but it's not clear what implementing algorithms like these in J will get you in terms of better understanding the algorithm or understanding the language.

Regards,

Devon
            +.--------------------+.
	
from	Don Guinn <donguinn@gmail.com>
to	Programming forum <programming@jsoftware.com>
date	Thu, Apr 9, 2009 at 12:27 PM
subject	Re: [Jprogramming] Efficient Quick Sort?
	
	
You might want to look at mapped files. That way you can allocate the storage and maybe control the storage boundaries.
            +.--------------------+.
	
from	Björn Helgason <gosinn@gmail.com>
to	Programming forum <programming@jsoftware.com>
date	Fri, Apr 10, 2009 at 3:36 AM
subject	Re: [Jprogramming] Efficient Quick Sort?
	
There are some interesting discussions on using J for sort in:

http://www.jsoftware.com/jwiki/Stories/HenryRich

There is also an interesting Bubble chart in:

http://www.jsoftware.com/jwiki/Plot/Full-Text

--
Björn Helgason, Verkfræðingur
            +.--------------------+.
	
from	Roger Hui <rhui000@shaw.ca>
to	Programming forum <programming@jsoftware.com>
date	Sun, Apr 12, 2009 at 12:12 PM
subject	Re: [Jprogramming] Efficient Quick Sort?
	
I encourage you to persist in this effort.  J is suitable for testing sorting ideas, for the same reasons that it is suited for testing other programming ideas: interactive environment, conciseness, expressiveness, etc.  One of the first uses of APL (the predecessor of J) was in sorting; see Chapter 6 of Ken Iverson's "A Programming Language".

http://www.jsoftware.com/jwiki/Doc/A_Programming_Language

There are a few existing essays on sorting:

http://www.jsoftware.com/jwiki/Essays/Quicksort
http://www.jsoftware.com/jwiki/Essays/Sorting_versus_Grading
http://www.jsoftware.com/jwiki/Essays/Order_Statistics

You do have to be careful when you are comparing times.  Two algorithms may be the same order but the fact may be obscured by the large constants involved.
            +.--------------------+.

from	david alis <david.alis@gmail.com>
to	Programming forum <programming@jsoftware.com>
date	Mon, Apr 13, 2009 at 2:22 PM
subject	[Jprogramming] Efficient Quick Sort?
	
There is a sentence in the essay http://www.jsoftware.com/jwiki/Essays/Quicksort that reads:
There is an amusing variant that  (in qsort) replaces the first comma by ; and the second by ,&<

I think it should be changed to

There is an illuminating variant that replaces the first comma by ; and the second by ,&<

   ] v=: 13 ?@$ 10
0 7 8 9 2 2 5 5 5 3 8 0 0
   qsort v
+----------------+-+----------------------+
|++-----+-------+|3|+-----------------+-++|
|||0 0 0|++---++|| ||++-----+--------+|9|||
|||     |||2 2|||| ||||5 5 5|+-+---++|| |||
|||     |++---++|| ||||     ||7|8 8|||| |||
|++-----+-------+| ||||     |+-+---++|| |||
|                | ||++-----+--------+| |||
|                | |+-----------------+-++|
+----------------+-+----------------------+

   ;<S:0 qsort v
0 0 0 2 2 3 5 5 5 7 8 8 9

Probably one of the simplest and best illustrations of how quicksort works.

            +.--------------------+.

from	Matthew Brand <mtthwbrnd@googlemail.com>
to	Chat forum <chat@jsoftware.com>
date	Fri, Feb 27, 2009 at 5:44 AM
subject	[Jchat] http://en.wikipedia.org/wiki/J_(programming_language)
	
I don't know how to edit wikipedia pages but the J page gives an example of how to code a quicksort algorithm without mentioning the primitive /: so people might think that you have to write special code to do sorts whereas the fact that you do not is one of the strengths of J.

http://en.wikipedia.org/wiki/J_(programming_language)

In the course of assembling this discussion, your organizer had a change of heart. I initially had reservations about the usefulness of using J to implement well-understood algorithms. In part, this is because these classic algorithms tend to be very scalar-oriented and do not speak to the strengths of J's array-orientation. Also, the point of many of these algorithms is speed and we all know how forcing a scalar algorithm into J is often a recipe for disappointing performance.

Another reservation I have about sorting routines in general is that the J (and APL) "grade" is a powerful though largely unnoticed concept in its own right. It's unnoticed because most of us, coming from a background of conventional languages, tend to see "grade" as an odd little detour on our way to "sort". However, by giving us the interesting mathematical object of "permutation vector", grade opens up possibilities not possible with simple sorting.

I suspect there are many possiblities of which I am un-aware but the one I know about, "un-sort", can be useful. For instance, if we want to efficiently remove duplicates from a list but retain the original order, as we might want to retain a user's original input order in consideration of its usefulness as entered, we can grade the original grade vector (with the indexes corresponding to the duplicates removed) to give us an "un-sort" list of indexes to return the list to its original order.

For example,

   citiestovisit=: <;._2]0 : 0
Boston
New York
Washington, D.C.
Arlington
New York
Philadelphia
)

NB. New York's so nice, we include it twice!
   ]ordcit=. /:citiestovisit
3 0 1 4 5 2
NB. Remove dupes efficiently but change order.
   (13 : 'y#~1,(}:y)~:}.y')ordcit{citiestovisit
+---------+------+--------+------------+----------------+
|Arlington|Boston|New York|Philadelphia|Washington, D.C.|
+---------+------+--------+------------+----------------+
   ]rmv=. (13 : '1,(}:y)~:}.y')ordcit{citiestovisit
1 1 1 0 1 1
NB. Keep order the same:
   (/:rmv#ordcit){rmv#ordcit{citiestovisit
+------+--------+----------------+---------+------------+
|Boston|New York|Washington, D.C.|Arlington|Philadelphia|
+------+--------+----------------+---------+------------+

Of course, it's simpler to use J's "unique" (~.) to accomplish this particular task, but it illustrates an interesting property of a grade vector. However, grade vectors have the potential for comfortably dealing with large arrays as simple integer vectors are often very convenient and compact arrays amenable to quick, simple manipulation.

However, though I remain personally un-motivated to study classic algorithms in J, I see that others find it potentially useful. I was particularly struck by David Alis's comment on the clarity of the J implementation of quicksort. So, perhaps in the spirit of "notation as a tool of thought", there is some value here.

Show-and-tell: When do we use an adverb?

This topic was motivated by one of our member's question: when do I use an adverb? Interestingly, this relates tangentially to the previous topic because quicksort is usually implemented in non-J languages as what we call an adverb because one of its inputs is an ordering function.

The following e-mail exchange touched on a number of issues pertinent to these topics:

from	June Kim <juneaftn@gmail.com>
to	Programming forum <programming@jsoftware.com>
date	Thu, Mar 13, 2008 at 8:27 AM
subject	[Jprogramming] sort by
	
Hello,

Look at the J code at
http://www.rosettacode.org/wiki/Sorting_Using_a_Custom_Comparator#J

I think it misses the point of the problem. It asks to show the sorting facility of a programming language using a custom comparator, that is a tailored sorting facility. The variability should be pluggable. Look at the other solutions.

The J solution given doesn't show that a flexible and pluggable sorting facility.

For example, I think the following code is aimed closer to the point of the problem:

  load 'strings'
  sortby=: 1 : '] \: u&.>'
  (# ,[:-[: a.&i. tolower) sortby strings

I am not sure if mine is a good J-ic approach. I'd be happy to see more J-ic way of doing it. Thanks.

            +.--------------------+.

from	Devon McCormick <devonmcc@gmail.com>
to	Programming forum <programming@jsoftware.com>
date	Thu, Mar 13, 2008 at 8:37 AM
subject	Re: [Jprogramming] sort by
	
Hi -

I think you're probably right about the J solution missing the point of the problem.

However, as is common in these sort of comparisons, the problem also misses the point of J.  The advantage J brings is not in slavishly duplicating ways that other languages work but in adding new perspective.  In this case, the problem misses out on the tremendous advantage of grading versus sorting.

I'm working on an exposition of these advantages and would welcome any examples people can come up with.

Devon

            +.--------------------+.
	
from	June Kim <juneaftn@gmail.com>
to	Programming forum <programming@jsoftware.com>
date	Thu, Mar 13, 2008 at 8:44 AM
subject	Re: [Jprogramming] sort by

First of all, thank you for the comment. I fully agree that grading is a much useful and higher concept than sorting.

However, in the following code:

mycmp=: 3 : 0
wt=. ;#&.> y
;(/:&.> lower L: 0) (\: ~. wt) { wt </. y
)

and the code to run

 mycmp strings

shows no way to change the behavior with another comparator(or way of ordering) without changing the code inside mycmp verb, which means its "inextendability".
            +.--------------------+.

from	david alis <david.alis@gmail.com>
to	Programming forum <programming@jsoftware.com>
date	Fri, Mar 14, 2008 at 2:11 AM
subject	[Jprogramming] Re: sort by	
	
Is something along the lines of quicksort what you were thinking of.
http://www.jsoftware.com/jwiki/Essays/Quicksort

However, observe the essay's final remark: "Note: This essay is meant to explore the workings of a classical algorithm.  To actually sort data in J, it is more convenient and more efficient to use /:~ ."

sel=: 1 : 'x # ['
quicksort=: 3 : 0
 if. 1 >: #y do. y
 else.
 (quicksort y <sel e),(y =sel e),quicksort y >sel e=.y{~?#y
 end.
)
            +.--------------------+.

from	June Kim <juneaftn@gmail.com>
to	Programming forum <programming@jsoftware.com>
date	Fri, Mar 14, 2008 at 2:34 AM
subject	Re: [Jprogramming] Re: sort by
	
Thank you for the reference.

Actually, I am happier with my solution:

 load 'strings'
 sortby=: 1 : '] \: u&.>'
 (# ,[:-[: a.&i. tolower) sortby strings

I reckon it is a more J-ic approach compared to the single mycmp verb (which isn't friendly to composition and combination for extension and variability) or the exemplary quicksort verb (of which intention I guess is just showing how quicksort works) -- but I might be wrong.

Example Adverb - Working with File Groups

The following example of an adverb is one I first used when dealing with the Netflix Challenge to handle its large (>100 million records) arrays. In this variant, I'm downloading large tables from a database in order manipulate them in J. The tables are saved as tab-delimited text files, one of which is shown here.

   dir 'FT_T_ISSU.tsv'
+-------------+------------------+---------+---+------+
|FT_T_ISSU.tsv|2009 3 13 16 38 40|612470496|rw-|-----a|
+-------------+------------------+---------+---+------+

First, we figure how to break this up into manageable (16 MB) pieces.

   612470496%16e6
38.279406
   shell 'perl countlines.pl FT_T_ISSU.tsv'
2187989 lines
   2187989%39
56102.282

Then we use a perl script to break the large file into numbered, (approximately) 16 MB pieces.

   shell 'perl splitLargeFl.pl FT_T_ISSU.tsv FTiss ".tsv" 56000'
Input file: FT_T_ISSU.tsv
Header: INSTR_ID	ACCESS_AUTH_TYP	ISS_ACTVY_STAT_TYP	DENOM_CURR_CDE...

Opening file FTiss0000.tsv.
Writing file FTiss0000.tsv.
Opening file FTiss0001.tsv.
Writing file FTiss0001.tsv.
...
Opening file FTiss0039.tsv.
Writing file FTiss0039.tsv.

The list of all the piece files names is:

   #ftinms=. {."1 dir 'FTiss*.tsv'
40
   shell 'take 1 1 FT_T_ISSU.tsv FTiss.tit'  NB. Get column titles

Though this file-splitting task could be done in J, perl is so much more well-suited to the job that it was quicker to write the routine in that language from scratch than to modify a J routine which had close functionality. The J version runs more quickly but this is exactly the wrong sort of optimization that continues to inordinately concern conventional coders.

So, building up to the adverb, we have a routine to read a tab-delimited file as a boxed table:

   readTSVFl
[: <;._1&> (9{a.) ,&.> [: <;._2 [: endLF (13{a.) -.~ fread
   $ftisstit=. ,readTSVFl 'FTiss.tit'
60
   5{.ftisstit
+--------+---------------+------------------+--------------+-------+
|INSTR_ID|ACCESS_AUTH_TYP|ISS_ACTVY_STAT_TYP|DENOM_CURR_CDE|END_TMS|
+--------+---------------+------------------+--------------+-------+

Read one of the small piece files to see how it works:

   $ftiss0=. readTSVFl 'FTIss0000.tsv'
56000 60
   3 5{.ftiss0
+----------+--------+--------+---++
|A>7?34FDWB|PUBLIC  |ACTIVE  |NOK||
+----------+--------+--------+---++
|9>7l41FDWB|PUBLIC  |ACTIVE  |NOK||
+----------+--------+--------+---++
|A>7744FDWB|PUBLIC  |ACTIVE  |NOK||
+----------+--------+--------+---++

However, we normally want to apply this file-reader across the whole group of files. Since we want to use an arbitrary verb and nouns, we define an adverb:

   getFlsInfo=: 2 : 0
   if. nameExists 'SHOWGFI' do.
       if. SHOWGFI do. smoutput y,': ',":qts'' end. end.
   rc=. u v y
NB.EG lnkey=: ((0&{"1) getFlsInfo readTSVFl)&.>rrmlnms
)

So, if we want to extract the 4th column from each piece of the original table broken down into tab-delimited files, we can use the adverb with the file-reader applied to a list of file names:

   endtms=: ((4&{"1) getFlsInfo readTSVFl)&.>ftinms   NB. END_TMS

Advanced topics

In this section, we continue our study of graphical representations of data by considering a few pages from Wainer's Visual Revelations in which he gives an example of how dual Y-axes can be abused to give a misleading graphic. However, he also shows how these dual Y-axes can be properly used to give us a better understanding of the data.

One of the interesting points he makes is how some simple graphical variations like this are largely unavailable in many graphics packages. I've seen this myself in the case where the data is more suitably plotted with a log or ln axis. A problem with doing this by simply taking the ln of the data is that the X-co-ordinates are then shown in a less-than-intuitive form. This suggests that we might like to label the X-axis by both the original ln form as well as the anti-log version of the same.

We also considered some examples of the usefulness of interactive graphics from Wilkinson's The Grammar of Graphics. Finally, we also looked at some examples of graphical displays as shown by selections from Computing and Graphics in Statistics, edited by Buja and Tukey.

Learning and teaching J

This section was meant to be a relaxing wrap-up of the session in which we watched some videos on You Tube. However, due to peculiarities of the system we had available (it was a game machine with J installed), this took longer than I had anticipated. Eventually, we were able to watch Brian Schott's tutorial on installing J, as well as some impressive examples from people using Dyalog APL: A Plea for Simplicity and Game of Life.

This promises to be a useful tool for demos once we get some decent tools for editing the video in a form suitable for posting to You Tube. Based on the demos I've seen, it looks like shooting You Tube video is reminiscent of the earliest days of cinema: it's necessary to shoot a live performance in one go.

One of our members recounted his experience using You Tube to quickly learn about some confusing aspects of i-phone programming (not in J, alas), so we've only begun to scratch the surface of this mediums as a teaching tool.

Scan of Meeting Notes