Vocabulary/IFamily

From J Wiki
Jump to navigation Jump to search

Back to: Vocabulary

The (i.)-family of verbs

Several J verbs have to do with searching: finding an item from a list of items, or finding the items that are equal, or counting the number of identical items.

The foundation of these verbs is x i. y, which looks in x to find the first item that matches y. All the other verbs share the same code, so they are called the (i.)-family.

The code supporting these verbs has been honed for decades, and now supports more variants than you can imagine, with performance better than you could expect. Most of these verbs run in linear time on most operands.

You should use these verbs whenever you can.


The verbs of the (i.)-family

These verbs are:

Self-Classify = y (deprecated) mask indicating which items of y match other items
Less x -. y Items of x not in y
Intersect x ([ -. -.) y Items of x that are also in y. The implementation is actually x ([ -.!.0 -.) y
Nub ~. y Unique items of y in order of appearance
Nub Sieve ~: y Boolean list of shape (#y)1 to mark the first appearance of an item of y
Key x u/. y Apply u on subsets of y that correspond to matching elements of x. The order of subsets matches the order of ~. x
Index Of x i. y index of first item of x that matches a cell of y, or (#x) if no match
Index Of Last x i: y index of last item of x that matches a cell of y, or (#x) if no match
Member x e. y Boolean array, 1 for each cell of x that appears in y
All In x *./@:e. y Atom, 1 if each cell of x appears in y
Any In x +./@:e. y Atom, 1 if any cell of x appears in y
Number In x +/@:e. y Atom, count of cells of x appearing in y


Internal rank

These verbs have infinite rank, but they operate as if they had an internal rank that depended on the rank of the arguments.

One of the arguments is the search space and the other is the probe. For the monads, y plays both roles; for dyads, x is the search space except in x e. y.

The search space is treated as a list of its items:

If the search space is an atom, it is treated as a 1-atom list.

The probe is treated as an array of items of the search space.

You can't tell what the probe means until you have looked at the rank of the search space.

   (i. 6) i. (2 3)       NB. search space is a list, its items are atoms; y has 2 atoms; result is a list
2 3
   $ (i. 6) i. (2 3)     NB. The frame of y with respect to the internal rank is (,2)
2
   (i. 6 2) i. (2 3)     NB. search space is a table, its items are lists; y is a single list; result is an atom
1
   $(i. 6 2) i. (2 3)    NB. The frame of y with respect to the internal rank is (empty)

   (i. 6 2 2) i. (2 3)   NB. search space is a brick, its items are tables; y has no tables, so is a single failed search
6

This is the same behavior as verbs modified with "_1, like in {"_1. The rank accessed by {"_1 is dependent on the rank of the argument; it's the argument's rank minus one.


The rank of the result

If the result of the verb is a copy of the search space with items deleted (as for -. and ~.), it has the rank of the search space (after converting an atomic search space to a list).

The result of the other verbs depends on the probe. For these verbs, the shape of the result is the frame of the probe with respect to cells of the search space. The result may be an atom or an array.


Tolerant comparison

The (i.)-family supports tolerant comparison. You can force exact comparison using Customize (!.), e.g.  i.!.0 . To apply a tolerance to Intersect, use x ([ -. -.!.f) y.


Performance

Recommendation

Use exact comparison (!.0) on numeric arrays with rank>1, or boxed arrays that contain numerics.

Use tolerant comparison on arrays that do not contain numerics.

Discussion

The (i.)-family run in linear time for most arguments.

This is true even for tolerant indexing, as long as the items being compared are atoms - an impressive feat by Roger Hui.

Even though tolerant indexing is fast on numeric lists, exact indexing is a little faster.

Thus: with numeric arguments, exact comparison is preferred.

But: tolerant indexing of non-atomic items that contain numerics has worst-case performance of O(n^2). This includes boxed arrays where the boxes contain numerics.

Exact indexing does not have this worst case. Thus, you should try hard to use exact comparison when you apply an (i.)-family verb to an array of rank greater than 1 that contains numerics.

If you are sure your argument does not contain numerics, tolerant comparison is preferred because it is faster if the items are non-atomic.


Details

1. If the search space is an atom, the result of x -. y (Less) and ~. y (Nub) is a list.


Preferred forms

1. The (i.)-family figures in a great deal of special code, including Special Combinations.