Essays/Sum of a Bit Array

From J Wiki
Jump to navigation Jump to search

A bit array is one in which each atom is represented by one bit in memory. Because bits are not directly addressable in the popular CPUs, bit arrays motivate many interesting algorithms. The following problems are representative.

Sum of a Bit Vector

A fast computation for +/b where b is a bit vector, can be modelled as follows:

POPCOUNT=: a. {~ +/"1 #: i.256     NB. popcounts for each possible byte
C       =: <. 255 % 8              NB. number of times the maximum popcount for a byte (viz., 8) ...
                                   NB. ... can be added to itself without overflowing a byte
ic      =: 3!:4                    NB. for "casting" integers into bytes and vice versa

popcount=: 3 : 0
 assert. 2 = 3!:0 y
 assert. 1 = #$y
 t=. POPCOUNT{~a.i.y,(4|-#y)$0{a.  NB. popcount for each byte
 w=. _2 ic t                       NB. convert to 4-byte integers
 m=. ((>.C%~#w),C) $ w,C$0         NB. convert into a C-column matrix
 z=. +/ a. i. 2 ic +/"1 m          NB. add the byte popcounts 4 at a time, then total them
 assert. z = +/,#: a.i.y
)

For example:

   popcount 'squeamish ossifrage'
79
   popcount 5!:5 <'popcount'
1101
   popcount 5{.a.
5

Eugene McDonnell, in his September 1980 Recreational APL article Summing a Vector, described Roger Moore's idea of using the translate instruction TR in the IBM S/360 to do what POPCOUNT{~a.i.y does above. I don't know whether Moore also had the 4-at-a-time idea to accumulate the byte popcounts. Henry Warren's book Hacker's Delight, section 5.1, has both ideas.

In C, the converting back and forth between bytes and words is free. Other than the POPCOUNT table, no array temp would be required. On a modern CPU with the popcnt instruction, the fastest method is to loop through x accumulating pop counts 32 bits at a time.

I4 *w=(I4*)x;
q=n/32;
for(i=sum=0;i<q;++i)sum+=popcnt(*w++);

Sum of a Bit Matrix

The problem is to compute +/b for a bit matrix b .  The following technique was presented at the Dyalog 2008 conference in Helsingør. The technique suffices to handle tall as well as wide matrices, and works for any associative and commutative scalar function (not just +). The analysis for higher rank arrays is similar.

The key idea is this. Suppose b is a bit matrix whose columns are labelled by the column indices.

   ] b=: 17 6 $ i. 6
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5

Computing +/b consists of adding up all the bits having the same column indices.

   8 *. 6
24

   ] b1=: 5 24 ($,) b,3 6$_
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
0 1 2 3 4 5 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

If the matrix is conceptually reshaped into one having 8*.{:$b columns, the column indices are still lined up, but the conceptual matrix now has a nice multiple of 8 columns, whence finicky bit operations are not needed except on a final small number of subtotals. The overall sum +/b is then is (24$i.6) +//. +/b1 .



Contributed by Roger Hui.