Essays/Index in Nub
(~.y)i.y finds the indices of y in the nub of y . Can you do it faster?
timer=: 6!:2 y=: 1e6 ?@$ 2e9 NB. integers timer '(~.y)i.y' 1.08679 timer '(~.i.]) i.~ y' 0.437234 y=: 1e6 ?@$ 0 NB. floating point numbers timer '(~.y)i.y' 2.62438 timer '(~.i.]) i.~ y' 1.29647 y=: ":&.> 1e6 ?@$ 1e5 NB. boxed strings timer '(~.y)i.y' 3.00702 timer '(~.i.]) i.~ y' 1.54582 y=: 1e6 4 ?@$ 2e9 NB. integer matrix timer '(~.y)i.y' 1.52394 timer '(~.i.]) i.~ y' 0.50255 y=: 1e6 ?@$ 100 NB. counterexample, not faster timer '(~.y)i.y' 0.0264034 timer '(~.i.]) i.~ y' 0.0321949
~. (nub) is in the same family as the dyad i. and is implemented using the same underlying code. (Others in the family are the monad ~: and the dyads e. -. i: .) The dyad x i. y is not equally fast on all arguments. Listed from faster to slower:
- boolean and literal
- small-range integers, where small means about the size of #x
- general (machine word) integers
- floating point numbers
When y is not one of the fast ones, (~.y)i.y requires two slow operations, one for ~.y and another for the i. , because ~.y and y have the same classification. In contrast, (~.i.])i.~y (same as (~.t)i.t=.i.~y ) uses one slow operation to do i.~y , producing a list of small-range integers, whence (~.i.]) is computed using two fast operations.
i.~y , even when "slow", is supported by special code, as the following benchmark demonstrates:
y=: 1e6 ?@$ 2e9 y0=: y+0 timer 'i.~ y' 0.357208 timer 'y i. y' 0.355987 timer 'y i. y0' 0.624246
Contributed by Roger Hui.