Essays/Tolerant Comparison

From J Wiki
Jump to navigation Jump to search

Tolerant comparison ameliorates the ugly realities of finite-precision representation and arithmetic. For example, 7 is not exactly equal to 100*0.07, nor 2 exactly equal to the square of the square root of 2:

   7 - 100 * 0.07
_8.88178e_16

   2 - *: %: 2
_4.44089e_16

The = dictionary page defines equality with a tolerance t thus: for finite x and y , x=y is 1 if the magnitude of x-y does not exceed t times the larger of the magnitudes of x and y . The tolerance is normally 2^_44 . Appendix C, System Conventions and Limits, says that the tolerance is less than or equal to 2^_34 . Historically, the upper bound on the tolerance was chosen so that comparisons involving 32-bit integers are exact (as if the tolerance is 0).

   7 = 100 * 0.07
1

   2 = *: %: 2
1


Model of Tolerant Equal

Comparison involving a larger tolerance can be modelled readily by implementing the description in the = page. The phrase <:!.0 means exact <: (as if the tolerance is 0).

   teq=: 1 : '|@- <:!.0 m * >.&|'

   1 (0.1 teq) 0.899
0
   1 (0.1 teq) 0.9
1
   1 (0.1 teq) 1.1
1
   1 (0.1 teq) 1.2
0
   1 (0.1 teq) 0.899 0.9 1.1 1.12
0 1 1 0

Region of Tolerant Equality

Adapted from R.K.W. Hui, Hashing for Tolerant Index Of, APL 2010, Berlin, 2010-09-16. The region of tolerant equality for a complex number was elucidated by Jay Foad of Dyalog Limited in 2010.

The definition of tolerant equality implies that the closed interval of numbers tolerantly equal to a real number x has endpoints x*1-t and x%1-t .

Interval.GIF

As the diagram (exaggeratingly) illustrates, the interval is not symmetric around x . The diagram is reflected about the origin if x is negative.

The region of numbers tolerantly equal to a complex number z is a near-circle with radius r=.t*|z .

Oval.GIF

More precisely, it is the union of two circular regions: The smaller (in black) is centered on z with radius r . The larger derives as follows. A point u on the boundary with magnitude >: |z satisfies (|u-z)=t*|u . That is, the ratio between |u-z and |u is t , a constant; therefore, that boundary is a circle of Apollonius with foci z and the origin. The larger circle (in red) circumscribes the two points p and q on the smaller circle with magnitude |z (in green) and the point s with magnitude (|z)%1-t collinear with z and the origin.

Circle2.GIF


Tolerance Less Than 1

A consequence of the definition is that tolerances greater than or equal to 1 do not give meaningful results when comparing values with the same sign (nor greater than 2 when comparing values with differing signs):

   1 (0.99 teq) 100
1
   1 (0.99 teq) 100.1
0

   1 (0.999 teq) 1000
1
   1 (0.999 teq) 1000.1
0

   1 (1 teq) 1e9
1
   1 teq"0/~ 10 ?@$ 2e9
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
   1 teq"0/~ 10 ?@$ 0
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1

Comparisons with 0

A further consequence is that comparisons with 0 are exact. By definition, x=!.t y if (|x-y) <:!.0 t * x>.&|y .  If x is 0,  we get:

(|x-y) <:!.0 t * x>.&|y
(|0-y) <:!.0 t * 0>.&|y
(|y) <:!.0 t * |y

Since t is less than 1, |y and hence y itself must be exactly 0 for the propositions to hold.

Comparisons with Infinity

Tolerant comparison is meaningless with infinite arguments. By definition, x=!.t y if (|x-y) <:!.0 t * x>.&|y .  If x and y are both infinite with the same sign (both _ or both __ ), the phrase x-y is a NaN error . Otherwise, if x or y is infinite or if both are infinite but with different signs,

(|x-y) <:!.0 t * x>.&|y
_ <:!.0 t * _
_ <:!.0 _
1

That is, infinity or negative infinity is "tolerantly equal" to every other number except itself.

Tolerant Index-Of

If index-of is applied to floating-point numbers and it is known that tolerant comparison is not required, then i.!.0 (exact index-of) instead of i. (tolerant index-of) is more efficient.

Tolerant Floor and Ceiling

Some special considerations are necessary for meaningful floor and ceiling operations. For small numbers a, we would like <.a to be equal to n if n is an integer which is tolerantly equal to a. However, when a is large, there may be multiple integers which are tolerantly equal to a:

   a =. 2^45
   a = a+2
1
   a = a+i:4
0 0 1 1 1 1 1 0 0

It is certainly unreasonable to make <.a larger than a by more than 1. Additionally, we would like for <.a to be less than or equal to >.a for all a (and in fact, our final solution will have equality if and only if a is tolerantly equal to an integer). To satisfy these constraints, we ensure that <.a exceeds a by at most 0.5, and that a exceeds >.a by less than 0.5. Thus (<.->.) a is an integer which is less than 1, and hence less than or equal to zero—we have (<.a) <: (>.a) as desired.

The algorithm used by J (see tfloor and tceiling below) starts with the integer nearest to a, rounding up if 0.5 (=!.0) 1|a. Then, to obtain the floor of a, we subtract one if the result is tolerantly greater than a. Since the rounded number was within 0.5 of a, the smallest this number can be is a - 0.5. Similarly, for the ceiling, we add one if it is tolerantly less than a.

Models

The following models of tolerant comparisons are translated from APL, as presented in Robert Bernecky, Comparison Tolerance, SHARP APL Technical Notes 23, 1977-06-10; Revision 1, 1978-07-15.

teq     =: 1 : '|@- <:!.0 m * >.&|'           NB. =   equal
tne     =: 1 : '[: -. m teq'                  NB. ~:  not equal
tlt     =: 1 : '< !.0 *. m tne'               NB. <   less than
tle     =: 1 : '<:!.0 +. m teq'               NB. <:  less than or equal
tge     =: 1 : '>:!.0 +. m teq'               NB. >:  greater than or equal
tgt     =: 1 : '> !.0 *. m tne'               NB. >   greater than

tfloor  =: 1 : '<.!.0@(0.5&+) ([ - m tgt) ]'  NB. <.  floor
tceiling=: 1 : '<.!.0@(0.5&+) ([ + m tlt) ]'  NB. >.  ceiling

For example:

   ] y=: 100+i:6
94 95 96 97 98 99 100 101 102 103 104 105 106

   100 (0.05 teq) y
0 1 1 1 1 1 1 1 1 1 1 1 0
   100 (0.05 tne) y
1 0 0 0 0 0 0 0 0 0 0 0 1
   100 (0.05 tlt) y
0 0 0 0 0 0 0 0 0 0 0 0 1
   100 (0.05 tle) y
0 1 1 1 1 1 1 1 1 1 1 1 1
   100 (0.05 tge) y
1 1 1 1 1 1 1 1 1 1 1 1 0
   100 (0.05 tgt) y
1 0 0 0 0 0 0 0 0 0 0 0 0

   (0.05 tfloor  ) y%100
0 0 1 1 1 1 1 1 1 1 1 1 1
   (0.05 tceiling) y%100
1 1 1 1 1 1 1 1 1 1 1 1 2


Implementation

J to minimize artifacts from the choice the tolerance value, J limits the maximum value of tolerance to 2^_34.

In the interests of speed, recent versions of the J engine have moved to a similar but different implementation of tolerant equality, which would be modeled as

TEQ=: 1 :'(x >!.0 y*-.m) ~: (y <:!.0 x*-.m)'

The differences here are subtle.

As a first approximation, the implementation treats as not equal positive values which differ by exactly the comparison tolerance.

As a second approximation, when dealing with inaccuracies introduced by floating point notation, this may also occur for negative values. Some brief experiments suggest that this occurs significantly less often.

As a third approximation, when dealing with those inaccuracies, this implementation may treat as equal positive values which the original model would have treated as equal. Brief experiments suggest that this is even rarer.

A formal treatment of these differences might be interesting.


Contributed by Roger Hui.