# Essays/Sum of a Bit Array

A *bit array* is one in which each atom is represented by one bit in memory.
Because bits are not directly addressable in the popular CPUs,
bit arrays motivate many interesting algorithms.
The following problems are representative.

## Sum of a Bit Vector

A fast computation for` +/b `where` b `is a bit vector, can be
modelled as follows:

POPCOUNT=: a. {~ +/"1 #: i.256 NB. popcounts for each possible byte C =: <. 255 % 8 NB. number of times the maximum popcount for a byte (viz., 8) ... NB. ... can be added to itself without overflowing a byte ic =: 3!:4 NB. for "casting" integers into bytes and vice versa popcount=: 3 : 0 assert. 2 = 3!:0 y assert. 1 = #$y t=. POPCOUNT{~a.i.y,(4|-#y)$0{a. NB. popcount for each byte w=. _2 ic t NB. convert to 4-byte integers m=. ((>.C%~#w),C) $ w,C$0 NB. convert into a C-column matrix z=. +/ a. i. 2 ic +/"1 m NB. add the byte popcounts 4 at a time, then total them assert. z = +/,#: a.i.y )

For example:

popcount 'squeamish ossifrage' 79 popcount 5!:5 <'popcount' 1101 popcount 5{.a. 5

Eugene McDonnell, in his
September 1980 Recreational APL article
*Summing a Vector*,
described Roger Moore's idea of
using the translate instruction TR in the IBM S/360
to do what` POPCOUNT{~a.i.y `does above. I don't know whether
Moore also had the 4-at-a-time idea to accumulate the byte popcounts.
Henry Warren's book *Hacker's Delight*, section 5.1,
has both ideas.

In C, the converting back and forth between bytes and
words is free. Other than the` POPCOUNT `table, no
array temp would be required.
On a modern CPU with the popcnt instruction,
the fastest method is to loop through` x `accumulating pop counts 32 bits at a time.

I4 *w=(I4*)x; q=n/32; for(i=sum=0;i<q;++i)sum+=popcnt(*w++);

## Sum of a Bit Matrix

The problem is to compute` +/b `for a bit matrix` b` .` `
The following technique was presented at the Dyalog 2008 conference in Helsingør.
The technique suffices to handle tall as well as
wide matrices, and works for any associative and
commutative scalar function (not just` +`). The analysis
for higher rank arrays is similar.

The key idea is this. Suppose` b `is a bit matrix whose columns are labelled by the column indices.

] b=: 17 6 $ i. 6 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5

Computing` +/b `consists of adding up all the bits having the same column indices.

8 *. 6 24 ] b1=: 5 24 ($,) b,3 6$_ 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

If the matrix is conceptually reshaped into one having` 8*.{:$b `columns,
the column indices are still lined up, but the
conceptual matrix now has a nice multiple of 8 columns,
whence finicky bit operations are not needed except on
a final small number of subtotals. The overall sum` +/b `is then
is` (24$i.6) +//. +/b1` .

Contributed by Roger Hui.