Essays/Combination Sums

From J Wiki
Jump to navigation Jump to search

The following problem was described in a post to comp.lang.apl by Ajay Askoolum on 2008-04-23:

Take the UK national lottery -- it has 6!49 combinations where the first (1,2,3,4,5,6) and the last (44,45,46,47,48,49) have each got a sum and the number of combinations that have the same respective sum is exactly 1. The objective was to list each sum between the two with the number of combinations ... quickly.


Solution

From the M. page of the dictionary:

comb=: 4 : 0 M.   NB. All size x combinations of i.y
 if. (x>:y)+.0=x do. i.(x<:y),x else. (0,.x comb&.<: y),1+x comb y-1 end.
)

A direct method:

cs=: 4 : '({.,#)/.~ +/"1 x comb y'

   3 comb 5
0 1 2
0 1 3
0 1 4
0 2 3
0 2 4
0 3 4
1 2 3
1 2 4
1 3 4
2 3 4
   3 cs 5
3 1
4 1
5 2
6 2
7 2
8 1
9 1

6 cs 49 would be a non-trivial computation. The intermediate array 6 comb 49 requires 335611584 = */4 6,6!49 bytes. There is a much more efficient method that derives from the doubly-recursive definition of comb : To compute x cs1 y , compute (x-1) cs1 y-1 and x cs1 y-1 , then combine them appropriately. Thus:

cs1=: 4 : 0 M.
 if. (x>:y)+.0=x do. (x<:y)#,:2!x,2
 else. (~.@[ ,. +//.)/ |: (((x-1),0)+"1 x cs1&<: y),(x,0)+"1 x cs1 y-1 end.
)

   3 (cs -: cs1) 5
1
   6 (cs -: cs1) 10
1
   6 (cs -: cs1) 11
1

To correct for the 1-origin indexing in the original problem, just add x to column 0 of the result.

   6!:2 't=. 6 0 (+"1) 6 cs1 49'
0.0252536
   $t
259 2
   3{.t
21 1
22 1
23 2
   _3{.t
277 2
278 1
279 1

<!> Due to the use of M. , reload the script or otherwise redefine cs1 to get the true timing (without this the reported time would be lower).

Expanded Commentary

The problem is to find the distinct sums of all combinations and their counts. For example:

   3 comb 5          NB. all size 3 combinations of i.5
0 1 2
0 1 3
0 1 4
0 2 3
0 2 4
0 3 4
1 2 3
1 2 4
1 3 4
2 3 4

   +/"1 ] 3 comb 5   NB. sum of each combination
3 4 5 5 6 7 6 7 8 9

   3 cs 5            NB. distinct sums and counts
3 1
4 1
5 2
6 2
7 2
8 1
9 1

cs can be defined succinctly:

cs=: 4 : '({.,#)/.~ +/"1 x comb y'

The key is the use of /. , the key adverb. In x u/. y , items of x specify keys for corresponding items of y and u is applied to each collection of y having identical keys. Here, x and y are both the combination sums, and u is the verb ({.,#) , first catenated with count. Thus:

   s=: +/"1 ] 3 comb 5
   s
3 4 5 5 6 7 6 7 8 9
   s ({.,#)/. s
3 1
4 1
5 2
6 2
7 2
8 1
9 1

Use of < (box) elucidates the action of the adverb:

   s </. s
┌─┬─┬───┬───┬───┬─┬─┐
│3│4│5 5│6 6│7 7│8│9│
└─┴─┴───┴───┴───┴─┴─┘
   </.~ s
┌─┬─┬───┬───┬───┬─┬─┐
│3│4│5 5│6 6│7 7│8│9│
└─┴─┴───┴───┴───┴─┴─┘

cs , while direct, is not very efficient. The intermediate array x comb y has shape (x!y),x , requiring */4,x,x!y bytes. In the original "national lottery" problem, x=:6 and y=:49 , requiring */4,6,6!49 or 335611584 bytes. We derive an alternative computation cs1 which is sufficiently efficient to calculate 50 cs1 100 in a couple of seconds.

The method derives from the classic doubly-recursive definition of comb , "classic" because the algorithm can be found in Gilman & Rose, APL\360: An Interactive Approach, 1969.

comb=: 4 : 0 M.   NB. All size x combinations of i.y
 if. (x>:y)+.0=x do. i.(x<:y),x else. (0,.x comb&.<: y),1+x comb y-1 end.
)

x comb y is 1+(x-1) comb y-1 prefaced with an initial column of 0 , catenated to 1+x comb y-1 . M. is the memo adverb; u M. is the same as u but but may keep a record of the arguments and results for reuse. The use of M. makes this double-recursive definition competitive with more complicated statements of comb .

The result of cs1 is a 2-column matrix of the sums and counts. Analogs are needed for the two recursive steps 0,.1+(x-1) comb y-1 and  1+x comb y-1 . For the first step, prefacing with a column of 0 changes neither the sums nor the counts, and adding by 1 increases each sum by x-1 (because there are x-1 entries in each combination) and changes the counts not at all. Thus: ((x-1),0)+"1 (x-1) cs1 y-1 . For the second step, adding by 1 increases each sum by x and does not change the counts. Thus: (x,0)+"1 x cs1 y-1 . It remains to merge these parts together to form a single (sums,counts) result.

If p=.p0,p1 is the catenation of the two parts, whose column 0 s is the catenated sums and whose column 1 c is the catenated counts, then the overall list of sums is just the nub of the sums, i.e. ~.s , and the overall list of counts is s+//.c , the counts accumulated according to the sums. This is accomplished by (~.@[ ,. +//.)/ |:p , inserting the verb (~.@[ ,. +//.) between the rows of the transpose of p . Putting it all together:

cs1=: 4 : 0 M.
 if. (x>:y)+.0=x do. (x<:y)#,:2!x,2
 else. (~.@[ ,. +//.)/ |: (((x-1),0)+"1 x cs1&<: y),(x,0)+"1 x cs1 y-1 end.
)

The original "national lottery" problem uses 1-origin (the first combination is 1 2 3 4 5 6 instead of 0 1 2 3 4 5 ); just add 6 to each sum to account for that. Thus:

   6!:2 't=: 6 0 (+"1) 6 cs1 49'
0.0250315

That is, the result is computed in 0.025 seconds (2.2 GHz Athlon 3200+). The shape of t and its first and last 4 rows:

   $ t
259 2

   4 {. t
21 1
22 1
23 2
24 3

   _4 {. t
276 3
277 2
278 1
279 1

The sum of the counts should be 6!49 :

   +/ t
38850 13983816

   6!49
13983816

As promised:

   6!:2 't=: 50 cs1 100'
2.05181

   $ t
2501 2

   +/ t
6.18998e6 1.00891e29

   50!100
1.00891e29

Further Improvements

The computation can be simplified and made more efficient by observing that column 0 of x cs1 y is always (2!x)+i.1+x*y-x , and that the counts for 1=x are y$1 . Thus:

cs2=: 4 : '((2!x)+i.0>.1+x*y-x) ,. x cs2a y'

cs2a=: 4 : 0 M.
 if. (x>:y)+.1>:x do. ((y*1=x)>.x<:y)$1
 else. (m{.x cs2a&<:y) + (-m){.x cs2a y-1 [ m=. 1+x*y-x end.
)

   (cs1 -: cs2)"0/~ i.10
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1

   6!:2 '50 cs2 100'
0.743049

cscheck is like cs2 but incorporates checks.

cscheck=: 4 : 0
 z=. x cs2 y
 assert. (-:<.) z
 assert. ($z) -: 2,~m=. 1+x*y-x
 assert. ({."1 z) -: (2!x) (+i.) m
 assert. (+/{:"1 z) = x!y
 assert. (-:|.) {:"1 z
 assert. ({.z) -: (2!x),1
 assert. ({:z) -: ((x*y-x)+2!x),1
)

   $ 50 cscheck 100
2501 2

Collected Definitions

comb=: 4 : 0 M.   NB. All size x combinations of i.y
 if. (x>:y)+.0=x do. i.(x<:y),x else. (0,.x comb&.<: y),1+x comb y-1 end.
)

cs=: 4 : '({.,#)/.~ +/"1 x comb y'

cs1=: 4 : 0 M.
 if. (x>:y)+.0=x do. (x<:y)#,:2!x,2
 else. (~.@[ ,. +//.)/ |: (((x-1),0)+"1 x cs1&<: y),(x,0)+"1 x cs1 y-1 end.
)

cs2=: 4 : '((2!x)+i.0>.1+x*y-x) ,. x cs2a y'

cs2a=: 4 : 0 M.
 if. (x>:y)+.1>:x do. ((y*1=x)>.x<:y)$1
 else. (m{.x cs2a&<:y) + (-m){.x cs2a y-1 [ m=. 1+x*y-x end.
)

cscheck=: 4 : 0
 z=. x cs1 y
 assert. (-:<.) z
 assert. ($z) -: 2,~m=. 1+x*y-x
 assert. ({."1 z) -: (2!x) (+i.) m
 assert. (+/{:"1 z) = x!y
 assert. (-:|.) {:"1 z
 assert. ({.z) -: (2!x),1
 assert. ({:z) -: ((x*y-x)+2!x),1
)



See also



Contributed by Roger Hui.