User:Brian Schott/Stemplot

From J Wiki
Jump to navigation Jump to search

Stemplots

For many data sets (with modest cardinality) a stemplot is a simple and clear alternative to a histogram Essays/Histogram.
WikiPedia:Stemplot
File:Stemleaf2011.ijs download scripts here

   stem =: <.@(%&10) :. (10&*)  NB. generalized in stemGen below
   sort=: /:~
   sortleaf=: |@/:~ NB. sort leaf
   stem =: <.@:(%&10) :(<.@: %~ )
   leaf=: (* * 10&|@|)@]
   stemNub=: (10 * ~.@:stem) : ([ * ~.@:stem)
   SLtab=: stemNub ;"0  stem sortleaf each@</. leaf


   ]sample =: 20?.@$ 20
6 15 19 12 14 19 0 17 0 14 6 18 13 18 11 12 18 0 10 2
    SLtab sortleaf sample
+--+---------------------------+
|0 |0 0 0 2 6 6                |
+--+---------------------------+
|10|0 1 2 2 3 4 4 5 7 8 8 8 9 9|
+--+---------------------------+

The stemplot using SLtab shows that 6 of the random integers are between 0 and 9, the rest are between 10 and 19.

A slightly more attractive readout is achieved by the verb pretty. Also notice that SLtab is dyadic and can take 2, 5, or 10 as its left argument.

   pretty =: (_5&{.@":@[,' | ',(1j0&":)@])&>/"1
   pretty SLtab sort sample
    0 | 000266
   10 | 01223445788899

   pretty 5 SLtab sort sample
    0 | 0002
    5 | 66
   10 | 0122344
   15 | 5788899

But as can be seen by the next data set taken from Wikipedia, gaps in the data can yield gaps in the stemplot (stems for 50 and 90 are missing).

   pretty SLtab 44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106
   40 | 4679
   60 | 34688
   70 | 2256
   80 | 148
  100 | 6

Additional features in SL eschew the gaps and deal with negative values, and distributing 0s in data when needed.

   d =: {: - {.
   rmonad=: 10 * ({: |.@:- i.@>:@d)  @stem
   rdyad=: [([ * ({: |.@:- i.@>:@d@])@stem)]
   r =: rmonad : rdyad       NB. range of stems
   tf =: >@{."1@]            NB. take first and open
   df =: }."1@]              NB. drop first
   fsg=: <"0@([r tf) ,. ,.@(([(r e. ])tf) expand&, df) NB. fill stem gaps
   SLgapless=: 10&$: : ([ fsg  sort @ SLtab)
   neg0=: [(":@]`('_'&,@":@|@:+)@.(0>])"0) tf          NB. recalc neg stems
   stemClean=: 10&$: : (<"1@neg0,.df)
   SL=: 10&$: :( [ stemClean balance0s@SLgapless)

   pretty SL 44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106
   40 | 4679
   50 |
   60 | 34688
   70 | 2256
   80 | 148
   90 |
  100 | 6

Also notice that SL is dyadic and can take 2, 5, or 10 as its left argument. Any zeroes in data that also contains both positive and negative data values is problematic because two zero stems are required: one positive and one negative (yes, _0). Furthermore, a decision regarding to which stem each zero value is assigned, must be made. Here we distribute an even number of zeroes equally between the two stems and the positive stem is favored if an odd number of zero data values is provided.

NB.* balance0s v monad
NB. When there are negative data and multiple values of 0
NB.    in the data, then the 0's need to be distributed
NB.    between the two stems 0 and _0. This verb does
NB.    that distribution, giving a slight bias to 0.
NB. The argument is a stemplot in boxes containing integers
balance0s=: monad define
if. 0<:<./ tf y do. y return. end.
if. 0>>./ tf y do. y return. end.
z=. 0 i.~ tf y
if. 1>:n=.+/0=k=.>(<z,1){y do. y return. end.
t=. y
m=.<.-:n  NB. number of zeros to move
t=. (<m}.k) (<z,1)}t
t=. ((<m#0) ,~&.>(<(z-1),1){t) (<(z-1),1)}t
)

To demonstrate we use data from Wikipedia which is rounded off before it is plotted.

   round =: <.@(0.5&+)
   rpl3=: 4 : 0   NB. 'replace' from jforum
   'x0 x1'=. x
    ((x1,a.) {~ (x0,a.) i. ]) y
)

   NB. examples taken from Wikipedia entry for stemplot
   cleanup =: [:round('-_'& rpl3)&.":

   ]wikidata =: cleanup '-23.678758, -12.45, -3.4, 4.43, 5.5, 5.678, 16.87, 24.7, 56.8'
_24 _12 _3 4 6 6 17 25 57


   pretty SL wikidata, 0 0
  _20 | 4
  _10 | 2
  _0  | 30
  0   | 0466
  10  | 7
  20  | 5
  30  |
  40  |
  50  | 7


NB. sample data sets
a =: 25 64 31 26 20
b =: 1 5 2 3 9 10 3
c =: _3 _1 5 3 9 10 2 19


Note 'demos'
SL a
SL b
SL c
SL c, 0 0
SL c, 0 0 0
5 SL c
2 SL c
SL wikidata
)

I want to acknowledge Keith Smillie's fine work on stem-and-leaf plots from which I have borrowed extensively.
WikiPedia:Stemplot