Essays/NearerLargerNeighbor

From J Wiki
Jump to: navigation, search

Nearer Larger Neighbor: J Programs Compared

                                       Howard A. Peelle 
                                       University of Massachusetts Amherst 
                                       hapeelle@educ.umass.edu 

The Nearer Larger Neighbor problem is introduced. J programs for solving it are defined and compared by speed, space, and spread.

Introduction

The “Nearer Larger Neighbor” problem is:

                                   For each number in a list, find the nearer number that is larger than it. 
                                   If the larger numbers to the left and right are equally near, pick the larger one. 
                                   If there is no larger number, give a suitable default number. 

Variants of this problem arise in several areas of mathematics and computer science, notably computational geometry for choosing Euclidean distances between points. See references below. Fundamentally, the problem entails bidirectional search yet generalizes naturally to higher dimensional arrays. It is instructional to develop programs with different approaches for a list first.

Asano et al [1] posed the “All Nearest Larger Neighbors” problem and presented several 1-D algorithms. Although their problem is slightly different, the following programs begin with my attempts to code theirs in J for comparisons here.

Asano’s Algorithms

A1

Asano’s Algorithm 1 is a right-and-left stack scan where the output is a list of associated indexes (converted to origin 0), and ties are resolved to the leftmost index:

A1 =: 3 : 0
n =. #y
s =. i.0
nln =. n#0
lnln =. n#0
rnln =. n#0
for_i. i.n
   do. 	while.	((#s)~:0)*.(i{y)>:({.s){y
	   do.	s =. }.s
	  end.
	if. (#s)=0
        do. lnln =.(-n) i} lnln
      else. lnln =. ({.s) i} lnln
       end.
	s =. i,s
  end.
s =. i.0
for_i. |.i.n		
   do. 	while.	((#s)~:0)*.(i{y)>:({.s){y
	   do.	s =. }.s
	  end.
	if. (#s)=0
        do. rnln =. (3*n) i} rnln
      else. rnln =. ({.s) i} lnln
       end.
	s =. i,s
  end.
for_i. i.n
   do.	if. (i-i{lnln)<:(i{rnln)-i
        do. nln =. (i{lnln) i} nln
      else. nln =. (i{rnln) i} nln
       end.
end.
nln
)

Example from [1]:

   A1  87 32 12 54 28 35 14 61 18 53
_10 0 1 0 3 3 5 0 7 7		NB. indexes with left preference

Notice that _10 is the default index for the largest item. Another example:

   A1 4 5 3 8 6 3 2 4 5 1
1 _10 1 _10 3 4 5 4 4 8

A1 incorrectly returns -n for the second item (5) when there is no larger item to the left even though there is a larger to the right and returns the wrong index for the eighth item (4) which has a nearer larger item (5) on its right. This program can be corrected and significantly simplified in J style programming:

A1j =: 3 : 0
n =. #y
nln =. i.0
for_i. 	i.n
do. 	item =. i{y
	left =. i
	while. 	left>:0
	do.	if. item<left{y do. break. end.
		left =. <:left
	end.
	left =. (left<0){left,-n
	right =. i
	while. 	right<n
	do. 	if. item<right{y do. break. end.
		right =. >:right
	end.
	right =. (right=n){right,3*n
	larger =. (i-left)<:right-i
	nln =. nln,larger{right,left
end.
)

  A1j 4 5 3 8 6 3 2 4 5 1
1 3 1 _10 3 4 5 8 4 8

This program is much shorter and increasingly faster than A1 and uses much less space.

It is desirable to change default index (-n) to i-n (and 3*n to i+n) so that the output can index actual items directly. So, use a negative default index for the largest item and eliminate the inner loops:

A1j =: 3 : 0
n =. #y
nln =. i.0
for_i. 	i.n
do.	item =. i{y
	left =. |.item<i{.y
	left =. (+./left){n,>:left i. 1
	right =. item<}.i}.y
	right =. (+./right){n,>:right i. 1
	nln =. nln,(left<:right){i+right,-left
end.
)

Example:

   ,i  =.  A1j v =. 4 5 3 8 6 3 2 4 5 1
1 3 1 _7 3 4 5 8 4 8

Then, to index items, simply enter:

    i{v
5 8 5 8 8 6 3 5 6 5			NB. default is largest item

Now change Asano’s problem to return items instead of indexes and to not prefer the left item in case of ties – that is, to solve the Nearer Larger Neighbor problem stated above.

A1a =: 3 : 0
n =. #y
all =. i.0
for_i.	i.n
do.	left =. right =. i
	while.	look=.left>:0
	do.	large =. left{y
		if. large>i{y do. break. end.
		left =. <:left
	end.
	left =. left%look
	large =. large*look
	while.	look=.right<n
	do.	larger =. right{y
		if. larger>i{y do. break. end.
		right =. >:right
	end.
	right =. right%look
	larger =. larger*look
	nearer =. (i-left)(>,>:)right-i
	all =. all,>./nearer{large,larger
end.
)

Example:

   A1a 4 5 3 8 6 3 2 4 5 1
5 8 8 0 8 6 4 5 6 5			NB. items (with 0 default)

This program is much faster and much slimmer than A1 despite doing more work to find the larger of two items at equal distance.

A2

Asano’s Algorithm 2 is a bidirectional scan by increasing distance, where the output is indexes with ties resolved to the leftmost index and -n is the default index (like A1).

A2 =: 3 : 0
n =. #y
nln =. n#0
for_i. 0 To n-1 do.
	nln =. (-n) i} nln
	for_k. 1 To Larger(i,n-i+1) do.
	 	if. (i-k)>:0 do. if. (i{y)<(i-k){y do.
			nln =. (i-k) i} nln
			break. end.
		end.
		if. (i+k)<n do. if. (i{y)<(i+k){y do.
			nln =. (i+k) i} nln
			break. end.
		end.
	end.
  end.
nln
)
To =: [ + i.@>:@-~
Larger =: >./

   A2 4 5 3 8 6 3 2 4 5 1
1 3 1 _10 3 4 5 8 4 8

Again, change Asano’s problem to return items instead of indexes and to return the larger item at equal distance for the Nearer Larger Neighbor problem:

A2a =: 3 : 0
n =. #y
nln =. i.0
for_i.	i.n
do.	for_d.	>:i.i>.n->:i
	do.	left =. right =. 0
		if. i>:d do. left=.(i-d){y end.
		if. i<n-d do. right=.(i+d){y end.
		larger =. left>.right
		if. larger>i{y do. break. end.
		larger =. 0
	end.
	nln =. nln,larger
end.
)

Same example:

   A2a 4 5 3 8 6 3 2 4 5 1
5 8 8 0 8 6 4 5 6 5		NB. items with 0 default

This program is about as inefficient as A2, mainly because of the algorithm. (Incidentally, using a while loop instead of a for loop is slower but saves a lot of space.)

Proposed A2 Algorithm

I propose an algorithm that searches in two stages -- a minimum distance (near) left and right, then the remaining distance left or right:

A2a =: 3 : 0
n =. #y
i =. 0
all =. i.0
while. i<n
   do.	d =. 1
	near =. i <. n-i+1
	while.	d<:near
  	   do.	left =. (i-d){y
		right =. (i+d){y
		larger =. left >. right
		if. larger>i{y do. goto_next. end.
		d =. d+1
	  end.
	near =. n-1+2*near
	if. i<n%2 do. d =. n-near else. d =. near-n+1 end.
	while. near>0
   	   do.	larger =. d{y
		if. larger>i{y do. goto_next. end.
		d =. d+*d
		near =. near-1
	  end.	
	larger =. 0				NB. default
label_next.	i =. i+1
	all =. all,larger
  end.
)

This program is much more efficient than A2 even though it returns items instead of indexes and does more work by computing the larger of two items at the same distance instead of exiting after finding a larger item on the left. Indeed, it is 10+ times faster than A2 and uses only 37% space for n=1e5 and increasingly faster for higher n. It can be improved further by using a two-item index (d) for distances left and right:

A2a =: 3 : 0
n =. #y
i =. 0
all =. i.0
while. i<n
   do.	d =. i + _1 1	
	while. *./d~:_1,n
  	   do.	larger =. >./ d{y
		if. larger>i{y do. goto_next. end.
		d =. d+_1 1
	  end.
	direction =. i<-:n
	d =. direction{d	
	direction =. direction{_1 1	
	while. *./d~:_1,n	
   	   do.	larger =. d{y
		if. larger>i{y do. goto_next. end.
		d =. d+direction
	  end.	
	larger =. 0	
label_next.	i =. >:i
	        all =. all,larger
  end.
)

This provides a 20% speed-up in slightly less space. This is the slimmest program here.

Try combining the two inner while loops and removing goto, using a default of –infinity:

A2a =: 3 : 0
n =. #y
i =. 0
all =. i.0
while. i<n
   do.	d =. i
		whilst.	(larger<:i{y)*.+./OK
		   do.	d =. d + _1 1 * OK=.d~:0,n-1
				larger =. >./OK#d{y			NB. default __
		  end.
		i =. i+1
		all =. all,larger
  end.
)

This is much shorter but slower, using slightly more space.

A better idea is to find the largest item first and exit the inner loop:

A2a =: 3 : 0
n =. #y
max =. >./y
i =. 0
all =. i.0
while. i<n
   do.	it =. i{y
	if. it=max do. larger=.0 goto_next. end.
	d =. i
	while.	it>:larger=.>./d{y
	   do.	d =. d + _1 1 * d~:0,n-1
	  end.
label_next.	i =. i+1
		all =. all,larger
  end.
)

This is 40% faster in the same space.

Lastly, try using a for loop and replace 0 defaults with each larger item found.

A2b =: 3 : 0
n =. #y
all =. y<>./y
for_i. all#i.n
   do.	d =. i
	while.	(i{y)>:larger=.>./d{y
	   do.	d =. d+_1 1*d~:0,n-1
	  end.
	all =. larger i} all
  end.
)

Nicely shorter and almost as speedy, although 40% fatter.

New Algorithms

Now consider my algorithms, presented in J. All programs allow any list of positive numbers input and return the same number of selected numbers output, with 0 as a default for no larger neighbor (unless noted otherwise).

Parallel Search by Position

Begin with a straightforward vector processing approach. Program NLN0 computes the nearer larger neighbor for each index position by creating a table of Neighbors from lists Lefts and Rights, finding the Largers in parallel, then selecting the Nearer item whose First indexed item is less than the input:

     EACH =: "0 _
NLN0 =: All NLN EACH ]
	All =: i.@#
	NLN =: { Nearer Larger@Neighbors
		Neighbors =: Lefts ,: Rights
			Lefts =: |.@{.
			Rights =: }.@}.
		Larger =: >./
		Nearer =: < First ]
			First =: {.@#

For example:

   NLN0 4 5 3 8 6 3 2 4 5 1
5 8 8 0 8 6 4 5 6 5

This could be written as a one-liner: NLN0 =: i.@# ({ (< {.@# ]) |.@{. >./@:,: }.@}.)"0 _ ]

Obviously the shortest program here, this runs a little faster in a little more space. It could also be defined explicitly:

NLN0x =: 3 : 0
n =. #y
i =. 0
all =. i.0
while. i<n
do.	x =. i{y
	lefts =. |.i{.y
	rights =. }.i}.y
	i =. i+1
	all =. all , x Nearer Larger lefts,:rights
end.
)

3% slower in 10% less space for n=1e5.

A slightly different tacit definition is:

NLN0a =: All Nearer@Larger@Neighbors EACH ]
	All =: i.@#
	Neighbors =: Lefts,: Rights
		Lefts =: 0: , |.@{.
		Rights =: }.
		Larger =: >./
	Nearer =: (> {.) First ]
		First =: {.@#
		Larger =: >./
		Larger =: >./

This has about the same efficiency as NLN0.

Also, ^: can be used to drop the Largers iteratively until the first is larger:

NLN0b =: All {.@Neibors EACH ]
	All =: i.@#
	Neibors =: { Near ^: While ^: _ Lefts Larger@,: Rights
	Near =: }.@]
	While =: >: {.
	Lefts =: |.@{.
	Rights =: }.@}.
	Larger =: >./

Short, but slow and fat.

Parallel by Distance

A different approach computes each distance in parallel instead of each position:

	EACH1 =: "0 1
	FLIP =: &. |.
NLN1 =: ] Nearer EACH1 |:@LN1
	Nearer =: < First ]
		First =: {.@#
	LN1 =: All1 Neighbors1 EACH ]
		All1 =: }.@i.@#			NB. distances
		Neighbors1 =: Lefts1 >. Rights1
			Rights1 =: #@] {. }.
			Lefts1 =: Rights1 FLIP

This program is 3 times slower than NLN0 and runs out of memory at n=1e5. It is possible to use Transpose, but it also requires excessive time and even more space:

	 T =: &. |:
NLN1 =: ] Nearer EACH1 T All1 Neighbors1 EACH ]
    Shifting

Try shifting in Lefts and Rights:

NLNs =: All NLN EACH ]
	All =: i.@#
	NLN =: { Nearer Neighbors
		Neighbors =: Lefts >. Rights
			Lefts =: #@] {. |.@{.
			Rights =: #@] {. }.@}.
		Nearer =: < First ]
			First =: {.@#

This uses 30% less space than NLN0.

Indexing

A different approach indexes each distance left and right. Explicitly:

NLN2 =: 3 : 0
all =. i.0
while. all <&# y
do. all =. all , (#all) NLNx y
end.
)
NLNx =: 3 : 0
:
item =. x{y
lefts =. (-#y){.x{.y
rights =. (#y){.x}.y
d =. 1
while. 	d<#y
do.	larger =. (d{rights)>.(-d){lefts
	if. larger>item do. larger return. end.
	d =. d+1
end. 0
)

This is much more efficient than NLN0 in time and space.

Masking

Another approach is to mask indexes progressively and store found largers for positions at distances d=1, 2, etc.

NLN3 =: 3 : 0
n =. #y
all =. n#0		
alli =. i.n
d =. 1			
while. d<n
do.	y =. y,0	
	lefts =. (alli-d){y
	rights =. (alli+d){y
	largers =. lefts >. rights
	mask =. largers > alli{y
	i =. mask # alli
	largers =. mask # largers
	all =. largers i} all
	alli =. alli -. i
	d =. d+1
end.	all
)

Very fast but fat.

Better yet, remove the index of the largest item before searching.

NLN3 =: 3 : 0
n =. #y
all =. y < >./y	
alli =. all # i.n	
d =. 1	
while.	0<#alli
do.	y =. y,0
	largers =. (alli-d){y
	largers =. largers >. (alli+d){y
	all =. largers alli} all
	largers =. largers <: alli{y
	alli =. largers#alli
	d =. d+1
end.	all
)

Faster and much less space for high n.

A variant using a +/ table:

NLN3a =: 3 : 0
n =. #y
all =. y < >./y
alli =. all # i.n
d =. 0
while. (#alli)>0
   do.	y =. y,0
		d =. d + 1 _1
		largers =. >./ (d +/ alli){y
		all =. largers alli} all
		largers =. largers <: alli{y
		alli =. largers#alli
  end.	all
)

3% faster in 16% more space.

Try another masking definition:

NLN3c =: 3 : 0
n =. #x =. y
y =. __ * y = >./y		NB. __ default
d =. 1
whilst. 0 e. y
do.	lefts =. (-n){.(-d)}.x
	rights =. n {. d }. x
	largers =. lefts >. rights
	largers =. largers * largers>x
	y =. y + largers * y=0
	d =. d+1
end. y
)

This conserves space but takes much more time.

Matrix

Dare a matrix approach (without looping):

NLN4 =: 3 : 0
ns =. ->:i.#y
right =. }:ns {."0 1 y
left =. }:|."1 ns {."0 1 |.y
larger =. left >. right
mask =. larger >"1 y
new =. larger *&{: mask
first1 =. |. </\ |.mask *"1 -.{:mask
old =. +/ larger * first1
new + old
)

Slow and very fat.

Boolean

Now try a Boolean approach within an inner loop:

NLN5 =: 3 : 0
n =. #y
i =. 0
all =. i.0
while. i<n
do.	x =. i{y
	d =. 1
	whilst. {./:~neighbors<:x
	do.	neighbors =. d < (i+1),(n-i)
		neighbors =. neighbors # i + d * _1 1
		neighbors =. neighbors { y
		d =. d+1
	end.
	i =. i+1
	all =. all,>./neighbors		NB. __ default
end.
)

Slim, but not so fast.

Spider

Use Infix \ (scan) to search like a spider stepping simultaneously to both sides in increasingly larger steps:

NLN6 =: 3 : 0
n =. #y
all =. n#0
smaller =. y < >./y
half =. -:n
d =. 1
while.	d<half
do.	dd =. >:+:d
	largers =. dd Spider\ y
	largers =. (d{.d}.y),largers,(-d){.(-d)}.y
	larger =. smaller*.largers > y
	all =. all+larger*largers
	smaller =. larger~:smaller
	d =. d+1
end.
while.	d>0
   do.	largers =. (-n){.d{.y
	largers =. largers >. n{.(-d){.y
	larger =. smaller*.largers > y
	all =. all+larger*largers
	smaller =. larger~:smaller
	d =. d-1
end.	all
)
Spider =: {. >. {:

Very slow in lots of space.

Recursion

The Nearer Larger Neighbor problem can be solved recursively:

   EACH =: “0 _
   ELSE =: `
   WHEN =: @.
NLN7 =: All {.@NLNr EACH ]
   All =: i.@#
   NLNr =: Leftr Nearerr Rightr
      Leftr =: |.@{. , {		NB.  item on end
      Rightr =: 1: |. }.	
      Nearerr =: Nearerr&}. ELSE 0: WHEN Done ELSE Largerr WHEN (Largerr>N)
         Largerr =: >.&{.
         N =: >.&{:
         Done =: +.&# = 0:

Not efficient, of course.

Comparisons

Now compare the most competitive programs by ratios of time, space, and length -- where 1.00 is the best – for the same n=1e6 non-distinct random positive integers. Finally, sum the ratios for a composite of overall program efficiency. See Appendix below for benchmark details and copies of these programs.

            Time     Space    Length   Overall


     A1a     32.9     1.67     1.21     35.8


     A2a     10.1     1.00     1.40     12.5


     A2b     10.4     1.42     1.00     12.8


     NLN3     1.03    4.08     1.25      6.37


     NLN3a    1.00    4.75     1.17      6.92


     NLN5    19.2     1.00     1.40     21.6

So, the winner is NLN3, using the Masking approach.

Readers may wish to use a higher n, average over several samples, and weight the three criteria.

Conclusion

The Nearer Larger Neighbor problem was introduced and programmed in J. Definitions of programs were compared, and the best was determined. (It was not the fastest, nor slimmest, nor shortest.) To follow up, see Essays/NearestLargerNeighbor for programs for 2-D, 3-D, and n-D arrays.

References

[1] T. Asano, S. Bereg, and D. Kirkpatrick, "Finding nearest larger neighbors: a case study in algorithm design and analysis", Lecture Notes in Computer Science, Efficient Algorithms, pp. 249-260, Albers, Alt & Naeher (eds.) Springer, 2009

[2] Weisstein, Eric W. “Nearest Neighbor Problem”, http://mathworld.wolfram.com

[3] Lifshits, Y. “The Homepage of Nearest Neighbors and Similarity Search”, http://simsearch.yury.name

Appendix

Benchmarks for time and space were obtained on a Dell Latitude E6410 laptop (64-bit OS M620 at 2.67 GHz with 4GB RAM) running J6.02.

Length = total number of characters in a program definition body, counting only 1 for each name.

Utility programs used:

Time =: 6 !: 2
Space =: 7 !: 2
Odom =: #~ #: i.@^

Example to match all lists of 6 integers:

   *./(NLN3a -: NLN3)"1 Odom~6
1

Example plots:

   load 'plot'
   plot n ; (Time,10&^.@Space)"1 'NLN0 ?#~',"1 ":,.n=.1000*>:i.10     NB. Note log space for common scale.

Special cases:

For an empty input, all programs return an empty output. For a list of constants, all programs return defaults.

Programs compared:

A1a =: 3 : 0
n =. #y
all =. i.0
for_i.	i.n
do.	left =. right =. i
	while.	look=.left>:0
	do.	large =. left{y
		if. large>i{y do. break. end.
		left =. <:left
	end.
	left =. left%look
	large =. large*look
	while.	look=.right<n
	do.	larger =. right{y
		if. larger>i{y do. break. end.
		right =. >:right
	end.
	right =. right%look
	larger =. larger*look
	nearer =. (i-left)(>,>:)right-i
	all =. all,>./nearer{large,larger
end.
)

A2a =: 3 : 0
n =. #y
max =. >./y
i =. 0
all =. i.0
while. i<n
   do.	it =. i{y
	if. it=max do. larger=.0 goto_next. end.
	d =. i
	while.	it>:larger=.>./d{y
	   do.	d =. d + _1 1 * d~:0,n-1
	  end.
label_next.	i =. i+1
		all =. all,larger
  end.
)

A2b =: 3 : 0
n =. #y
all =. y<>./y
for_i. all#i.n
   do.	d =. i
	while.	(i{y)>:larger=.>./d{y
	   do.	d =. d+_1 1*d~:0,n-1
	  end.
	all =. larger i} all
  end.
)

NLN3 =: 3 : 0
n =. #y
all =. y < >./y	
alli =. all # i.n	
d =. 1	
while.	0<#alli
do.	y =. y,0
	largers =. (alli-d){y
	largers =. largers >. (alli+d){y
	all =. largers alli} all
	largers =. largers <: alli{y
	alli =. largers#alli
	d =. d+1
end.	all
)

NLN3a =: 3 : 0
n =. #y
all =. y < >./y
alli =. all # i.n
d =. 0
while. (#alli)>0
   do.	y =. y,0
	d =. d + 1 _1
	largers =. >./ (d +/ alli){y
	all =. largers alli} all
	largers =. largers <: alli{y
	alli =. largers#alli
  end.	all
)

NLN5 =: 3 : 0
n =. #y
i =. 0
all =. i.0
while. i<n
do.	x =. i{y
	d =. 1
	whilst. {./:~neighbors<:x
	do.	neighbors =. d < (i+1),(n-i)
		neighbors =. neighbors # i + d * _1 1
		neighbors =. neighbors { y
		d =. d+1
	end.
	i =. i+1
	all =. all,>./ neighbors		NB. __ default
end.
)

Contributed by Howard Peelle