NYCJUG/2015-05-12

From J Wiki
Jump to navigation Jump to search

integer partitions


Location:: the Heartland, downstairs

Meeting Agenda for NYCJUG 20150512

1. Beginner's regatta: Random partitions.

2. Show-and-tell: See "Canny Edge Detection".

3. Advanced topics: "See Graphics in Reverse" and "WD Spy".

4. Learning and teaching J: See "[Selections from Stackoverflow’s] 2015
Developer Survey", "Just Do It", and sample drawings from Pascal Jasmin's
code.

Random Partitions

The Challenge

Dan B. posed a challenge to the J Forum:

from:   Dan Bron <j@bron.us>
to:     programming@jsoftware.com
date:   Fri, May 8, 2015 at 11:54 AM
subject:[Jprogramming] Random distribution
. I have a quantity Q and I want to divide it into a vector V of N randomly-sized chunks, such that N=#V and Q=+/V .

. Ideas?

. -Dan

Early Attempts

I came up with a quick and dirty hack that wasn't very good:

from:   Devon McCormick <devonmcc@gmail.com>
date:   Fri, May 8, 2015 at 12:24 PM

Sample J session outlining a solution:

   Q=. 100
   N=. 20
   rr=. N?100*N            NB. Or however you want to do this
   Q%+/rr
0.00499875
   +/rr=. <.0.5+rr*Q%+/rr  NB. Just missed it!
99

This wasn't quite right - it missed the target (of summing to Q or 100) by 1.

   NB. Fix miss
   tot=. +/<.0.5+rr*Q%+/rr
   +/rr=. ((rp{rr)+Q-tot) (rp=.?#rr)}rr
100
   rr
3 6 3 7 7 4 7 0 10 5 4 1 5 5 8 5 7 3 2 8

Pascal J. asked for an example solution, which I provided. However, even a cursory examination of this made me realize my solution was not very good as I had a zero in the result. Even though this was not dis-allowed, it seemed obviously wrong as we could pad any too-short solution with however many zeros needed to make its length N.

from:   Devon McCormick <devonmcc@gmail.com>
date:   Fri, May 8, 2015 at 12:55 PM
. Like this?
   V=. 3 6 3 7 7 4 7 0 10 5 4 1 5 5 8 5 7 3 2 8 [ 'N Q'=. 20 100
   (N,Q) -: (#,+/) V
1

However, at least I provided an example validity check at the end here for a V generated for a given N and Q.

Marshall was possibly mis-led by my example as he set off down a similar path:

from:   Marshall Lochbaum <mwlochbaum@gmail.com>
date:   Fri, May 8, 2015 at 1:05 PM
. This solution leaves out all the zeros in V, so you might end up with a short vector.

. The interpretation of the original problem that I would use is: V is a vector of nonnegative integers with shape N whose sum is Q, and each such vector has an equal probability of being chosen. Unfortunately, the approach here makes answers with a zero less likely, because there are fewer sequences which, when sorted, have adjacent terms which are equal (that is, the sequences 3 5 and 5 3 both turn into 3 5, but only 3 3 turns into 3 3 when sorted).

. To get the correct probabilities, you need dyad (?):

   N ([: <: 2 -~/\ <:@[ (0 , /:~@:? , ]) +) Q
15 16 10 6 14 24 0 4 3 8
. In order to ensure that zeros are possible in the result, I first added N to Q, and then subtracted one from each value in the result.

Bob Therriault outlined a possible approach:

from:   robert therriault <bobtherriault@mac.com>
date:   Fri, May 8, 2015 at 1:00 PM
. A quick idea on my way out the door.

. Create a vector of N 1's

   N#1
. add a random vector of N 0's and a single 1
   (i.=?)N
. Sum the new vector V'  and exit if +/V' is equal to Q (probably a converging power conjunction)

. Cheers, bob

Finally, R. E. Boss showed up with something that seemed headed in the correct direction:

from:   R.E. Boss <r.e.boss@outlook.com>
date:   Fri, May 8, 2015 at 1:07 PM


   V=: Q*(%+/) 2 rnd N ?@# 0

   N=:1000
   Q=:345

   +/V
345

   #V
1000

Further Elaboration

The various attempts prompted Dan to outline more clearly what he was seeking.

from:   Dan Bron <j@bron.us>
date:   Fri, May 8, 2015 at 2:40 PM
. Joe Bogner wrote:

. > +/ every ( (([| /|~ ][| ? N&#) I. i.) </. #&1) Q ]

. Yes, I. does seem germane to me, but I tried this verb and it seemed to produce vectors of different lengths at each iteration. I need the result to be exactly and always N numbers long, and their sum to be exactly and always Q .

. Marshall wrote: . > The interpretation of the original problem that I would use is: . > V is a vector of non-negative integers with shape N whose sum is Q.

. Thanks Marshall. The actual specification is

. V is a vector of POSITIVE integers with shape N whose sum is Q.

. That is: there must be exactly N buckets, and no bucket may be empty. All possible ordered series of buckets must be equally likely.

. Pascal wrote: . > Maybe show some valid vectors for small Q? . For example [1], for N=3 and Q=7

        1 1 5
        1 2 4
        1 3 3
...
        5 1 1
. A very regular pattern, now that I look at it. If we knew the number ZFNP of possible 0-free N-partitions an integer Q, then we'd only have to roll ?ZFNP and then figure out some function analogous to #: to break it apart into its constituents.

. REB wrote: . > V=: Q*(%+/) 2 rnd N ?@# 0

. This looks promising. What's rnd here?

. Bob wrote: . > Probably a converging power conjunction

. Ideally, I'm looking for an instantaneous / closed form solution, one which avoids iteration or recursion, if possible.

. -Dan

And, in a follow-up message, Dan says:

. > V is a vector of POSITIVE integers with shape N whose sum is Q.

. Minor correction "... with length N ... " (obviously shape is always a vector, so the shape would be ,N).

. > If we knew the number ZFNP of possible 0-free N-partitions an integer Q, then we'd only have to roll . > ?ZFNP and then figure out some function analogous to #: to break it apart into its constituents.

. OK, so after a little googling, I'm not looking for Q's partitions, but its compositions. The difference is the same between a combination and a permutation: two partitions with the same elements but a different ordering are considered the "same" parition; two compositions with the same elements but a different ordering are considered "different" compositions.

. According to Mathworld [1], the number of compositions of y into x parts (where 0 is not allowed as a part) is given by:

 (!<:y)%(!<:x)*(!y-x)
. or plain old:
 ! * %
. For example, the table I posted earlier with N=3 and Q=7 had 15 rows:
   3 (! * %) 7
15
. So now we can choose:
   rci =: [: ? ! * %  NB. Random Composition Index
. For example:
   3 rci 7
14

. Now we just have to figure out how turn 14 into 5 1 1 and 3 into 1 4 2 and 8 into 2 4 1 . . -Dan

Beginning of a Break-Through

Dan's comment about the regularity of the pattern of solutions and his idea of coming up with a generator - some f such that f(14)->5 1 1, and f(3)->1 4 2, and so on - was soon followed by this solution from Marshall:

from:   Marshall Lochbaum <mwlochbaum@gmail.com>
date:   Fri, May 8, 2015 at 3:16 PM
. If you don't need zeros, the solution is a bit easier: just take out the changes I made (add N to Q and subtract 1 from the result) to ensure zeros are possible!

. There was a slight mistake in my algorithm besides that--the first number had a minimum that was one too small. So you have to use _1 instead of 0 as the element to be put in front of the random numbers, and to make sure the sum is correct, decrease Q by 1.

. The solution:

   N =: 10 [ Q =: 100
   V =: N (2 -~/\ _1 , /:~@:? , ])&<: Q
   (+/ , #) V
100 10
   V
5 1 21 5 8 3 17 24 7 9
. Bonus: a way to enumerate the possible outcomes. Since dyadic ? picks one of the rows of comb with equal probability, the correctness of this function proves that the random function works.
   require 'stats/base'
   3 (] (2 -~/\ _1 , /:~@] , [)"_1 comb)&<: 7

---

from:   Dan Bron <j@bron.us>
date:   Fri, May 8, 2015 at 3:30 PM
. Marshall wrote:

. > V =: N (2 -~/\ _1 , /:~@:? , ])&<: Q

. BINGO. Thank you!

. So you're creating an ordered list of <:N unique numbers less than <:Q, bookended with _1 and <:Q, and taking the *gaps between these numbers* as the vector V. . I have a vague intution of why the sum of any such bookended vector of gaps must necessarily be identically equal to N, but can you spell it out for me? . > require 'stats/base' . > 3 (] (2 -~/\ _1 , /:~@] , [)"_1 comb)&<: 7

. Also, you wanna take a whack at inverting your enumeration function here? That is, given an index i, produce the corresponding row from the table

   3 (] (2 -~/\ _1 , /:~@] , [)"_1 comb)&<: 7  .

---

from:   Marshall Lochbaum <mwlochbaum@gmail.com>
date:   Fri, May 8, 2015 at 4:09 PM
. This algorithm is worth writing a bit more about, I guess.

. I'll set N=3 and Q=7 for all the examples. First, consider

   N (_1 , /:~@:? , ])&<: Q
_1 1 2 6
. consisting of a _1, then (N-1) strictly increasing integers in the set [0,Q-1), then (Q-1). Since all the numbers in the middle are greater than _1 but less than (Q-1), the whole list is strictly increasing, i.e. each element is strictly greater than the one before it.

. To get our final answer, we just apply (2 -~/\ ]) to the result. There were (N+1) numbers before, so there are now N numbers. Since the numbers were strictly increasing, each of the differences must be at least 1. To find the sum of the numbers, consider what ([| +/ 2 -~/\ ]]) does to a list:

a b c d e
(2 -~/\ ])
a-~b b-~c c-~d d-~e
(+/)
a-~b+b-~c+c-~d+d-~e
 =
a-~(b-b)+(c-c)+(d-d)+e
 =
e-a
. So the sum of the numbers is the difference of the first and last numbers of the random increasing list earlier. But those numbers are just _1 and (Q-1), and their difference is Q.

. As an aside, J knows this fact in some sense. Consider the inverse

   +/\ b. _1
(- |.!.0) :.(+/\)
. It happens that (- |.!.0) is equivalent to (2 -~/\ 0&,), which is equivalent to ({. , 2 -~/\ ]). This manipulation is a bit confusing but doesn't really change anything. Now using the definition of the inverse,
a -: +/\ (- |.!.0) a
a -: +/\ ({. , 2 -~/\ ]) a
a -: +/\ ({.a) , 2 -~/\ a
a -: ({.a) + 0 , +/\ 2 -~/\ a
. We want to isolate (+/ 2 -~/\ a), which is the last element of the sum scan. Taking the tail of both sides,
({: a) -: {: ({.a) + 0 , +/\ 2 -~/\ a
({: a) -: ({.a) + {: +/\ 2 -~/\ a
(({:a) - ({.a)) -: {: +/\ 2 -~/\ a
(({:-{.) a) -: +/ 2 -~/\ a
. I'm not quite sure what you mean by inverting the enumeration function.  (Note: the sorting isn't necessary for comb even though it is for (?), so the solution can be reduced to (N (] (2 -~/\ _1 , ,~)"_1 comb)&<: Q))

. The part that's applied with rank _1 takes a combination and Q and produces the appropriate composition of Q. So given the i'th element of ((N-1) comb (Q-1)) it's easy to produce the i'th composition:

   N (] (2 -~/\ _1 , ,~) i{comb)&<: Q
. I suppose if you have a memory-efficient way to produce (i{comb) then this gives you an efficient way to get the i'th composition.

---

from:   R.E. Boss <r.e.boss@outlook.com>
date:   Fri, May 8, 2015 at 4:58 PM
. > DB wrote:

. > > V=: Q*(%+/) 2 rnd N ?@# 0 . > . > This looks promising. What's rnd here? . round on x decimals

   rnd
+-+-+------------------------+
|3|:| 2 rnd y                |
| | |:                       |
| | | t %~ <. 0.5+ y* t=.10^x|
+-+-+------------------------+

---

from:   Devon McCormick <devonmcc@gmail.com>
date:   Fri, May 8, 2015 at 5:23 PM
. If you're interested in the "integer partition" problem in general, Howard Peelle has J'ed it extensively:

. http://www.jsoftware.com/jwiki/DevonMcCormick/HAP_AIP?highlight=%28AIP%29 .

---

from:   [[User:Raul Miller|Raul Miller]] <rauldmiller@gmail.com>
date:   Fri, May 8, 2015 at 6:23 PM
. With that in mind, it's fairly easy to satisfy Dan's stated requirement.

. Doing this efficiently will be another matter.

NB. from http://www.jsoftware.com/jwiki/DevonMcCormick/HAP_AIP?highlight=%28AIP%29

         ELSE =: `
         WHEN =: @.
         EACH =: &.>
  Skiena =: Parts ELSE Ones WHEN Under2          NB. (n) Skiena (p)
         Ones =: ,:@#
         Under2 =: 2 > ]
         Parts =: Top , Bottom
               Top =: ] ,. - Skiena - <. ]
               Bottom =: Skiena <:

NB. for this specific task:

rdist=:4 :0
  base=. 1+(#~ x>:+/"1@:*)Skiena~y-x
  ({~ ?@#) ~.,/({"1~/ (A.&i.~!)@{:@$) base
)
 . Example use:
   3 rdist 6
3 2 1
   3 rdist 6
1 3 2
   3 rdist 6
1 1 4

… Thanks,

---

from:   Dan Bron <j@bron.us>
date:   Fri, May 8, 2015 at 6:56 PM
. Of course! (He says with the benefit of hindsight and a clear explanation.)  Yes, what I'm looking for is an efficient, analytic, closed form function which takes i, N, Q and produces i { comb .

. I am certain it exists. I almost wish it were gray December again so I could justify staying inside tomorrow and playing with the idea. (But instead I'm gonna take a mini road trip with friends to some idyllic little town upstate and picnic by a lake.)

. Thanks for all your help today! I really appreciate it.

. REB: don't think I've overlooked your solution, either. Now that I know rnd is round I'm going to start exploring it.

. Please excuse typos; sent from a phone.

---

from:   R.E. Boss <r.e.boss@outlook.com>
date:   Sat, May 9, 2015 at 1:23 AM
. I was actually treating Q as a real number.  The solution is in line with that of Lochbaum
   [Q=:-pi
_3.1415927
   N=:7
   (+/,#) Q* 2-~/\/: ~0, 1,~ 0 ?@#~ <:N
_3.1415927 7

Final Round-up: Evaluating Some Solutions

I evaluated four solutions by running them against two sets of random arguments (Different N of 10 and 20 versus a Q of 100) and generated histograms of the results sets from each of the four.

My own initial solution is clearly the most slapdash of the lot. This is based on the intuition that if we generate the correct number of random numbers, we can scale them by the ratio of their sum to the intended sum Q. This solution gives very poor results: the code is ad-hoc and verbose and relies on fixing an initial, poor approximation of the answer with an arbitrary hack.

randAddupSub=: 3 : 0
   'N Q'=. y
   rr=. Q ([: <. ] * [ % [: +/ ]) N?@$Q
   if. Q>+/rr do. rr=. ((rp{rr)+Q-+/rr) (rp=.?#rr)}rr end.
   rr
)

randAddup=: 3 : 0
   V=. randAddupSub y
   while. 0 e. V do.
       nn=. randAddupSub (>:0+/ . = V),>./V
       V=. (nn) ((V i. >./V),I. V e. 0)}V
   end.
   V
)

An example use of the above to generate a pair - V0 and V1 - of result sets to evaluate by looking at their histograms. I generated twice as many 10-element solutions as 20-element ones (summing to 100 in both cases) so that the number of results are the same in each result set to make the pair of histograms more comparable.

   'V0 V1'=. randAddup”1&.>(10,.2e5$100);20,.1e5$100

Upon reflection, I came up with a better algorithm. This generates a random Boolean of length Q, then calculates V as the length of the partitions implicit in this Boolean: each one starts a new partition.

randPtnDHM=: 3 : '(2 -~/\ 0 , ]) I. 1,~(1) (?/y-1 0)}0$~{:y'
NB.EG V=. randPtn N,Q

'V0 V1'=. randPtnDHM"1&.>(10,.2e5$100);20,.1e5$100

Graphing Some Solutions

In the following table, we see histograms for four of the proposed solutions. We're looking at the distribution of bucket sizes for 10 (blue) and 20 (red) buckets summing to 100. Even this small sample should make clear the relative merits of these solutions.

RandPtnDHM0-code.jpg|style="width: 3.2in; border-top-color: black; border-right-color: black; border-bottom-color: black; border-left-style: none; padding: 0in 5.4pt;" |width="502",height="502",v:shapes="Picture_x0020_1"
RandPtnDHM-code.jpg|style="width: 3.2in; border-top-style: none; border-left-style: none; border-bottom-color: black; border-right-color: black; padding: 0in 5.4pt;" |width="502",height="502",v:shapes="Picture_x0020_2"
RandPtnML-code.jpg|style="width: 3.2in; border-top-style: none; border-left-style: none; border-bottom-color: black; border-right-color: black; padding: 0in 5.4pt;" |width="502",height="502",v:shapes="Picture_x0020_4"
RandPtnREB-code.jpg width="502",height="502",v:shapes="Picture_x0020_5"

The first of these, my original attempt, is clearly different from the other three: the distributions look odd and arbitrary. In fact, they're probably greatly influenced by my initial choice of random values. The next two, my second attempt and Marshall Lochbaum's, look very similar to each other: both have reasonable-looking distributions. However, the last one, due to R. E. Boss, appears to have the neatest distribution: the bars are strictly descending in height (except for the rightmost blue one which is slightly elevated because the histogram divisions end short of the actual distribution - but this is merely an artifact of the graph).