# User:Brian Schott/Stemplot

## Stemplots

For many data sets (with modest cardinality) a stemplot is a simple and clear alternative to a histogram
Essays/Histogram.

WikiPedia:Stemplot

File:Stemleaf2011.ijs download scripts here

stem =: <.@(%&10) :. (10&*) NB. generalized in stemGen below sort=: /:~ sortleaf=: |@/:~ NB. sort leaf stem =: <.@:(%&10) :(<.@: %~ ) leaf=: (* * 10&|@|)@] stemNub=: (10 * ~.@:stem) : ([ * ~.@:stem) SLtab=: stemNub ;"0 stem sortleaf each@</. leaf ]sample =: 20?.@$ 20 6 15 19 12 14 19 0 17 0 14 6 18 13 18 11 12 18 0 10 2 SLtab sortleaf sample +--+---------------------------+ |0 |0 0 0 2 6 6 | +--+---------------------------+ |10|0 1 2 2 3 4 4 5 7 8 8 8 9 9| +--+---------------------------+

The stemplot using *SLtab* shows that 6 of the random integers are between 0 and 9, the rest are between 10 and 19.

A slightly more attractive readout is achieved by the verb *pretty*.
Also notice that *SLtab* is dyadic and can take 2, 5, or 10 as its left
argument.

pretty =: (_5&{.@":@[,' | ',(1j0&":)@])&>/"1 pretty SLtab sort sample 0 | 000266 10 | 01223445788899 pretty 5 SLtab sort sample 0 | 0002 5 | 66 10 | 0122344 15 | 5788899

But as can be seen by the next data set taken from Wikipedia, gaps in the data can yield gaps in the stemplot (stems for 50 and 90 are missing).

pretty SLtab 44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106 40 | 4679 60 | 34688 70 | 2256 80 | 148 100 | 6

Additional features in *SL* eschew the gaps and deal with negative values, and
distributing 0s in data when needed.

d =: {: - {. rmonad=: 10 * ({: |.@:- i.@>:@d) @stem rdyad=: [([ * ({: |.@:- i.@>:@d@])@stem)] r =: rmonad : rdyad NB. range of stems tf =: >@{."1@] NB. take first and open df =: }."1@] NB. drop first fsg=: <"0@([r tf) ,. ,.@(([(r e. ])tf) expand&, df) NB. fill stem gaps SLgapless=: 10&$: : ([ fsg sort @ SLtab) neg0=: [(":@]`('_'&,@":@|@:+)@.(0>])"0) tf NB. recalc neg stems stemClean=: 10&$: : (<"1@neg0,.df) SL=: 10&$: :( [ stemClean balance0s@SLgapless) pretty SL 44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106 40 | 4679 50 | 60 | 34688 70 | 2256 80 | 148 90 | 100 | 6

Also notice that *SL* is dyadic and can take 2, 5, or 10 as its left
argument.
Any zeroes in data that also contains both positive and negative data values
is problematic because two zero stems are required: one positive and one
negative (yes, _0). Furthermore, a decision regarding to which stem each
zero value is assigned, must be made. Here we distribute an even number of
zeroes equally between the two stems and the positive stem is favored if an
odd number of zero data values is provided.

NB.* balance0s v monad NB. When there are negative data and multiple values of 0 NB. in the data, then the 0's need to be distributed NB. between the two stems 0 and _0. This verb does NB. that distribution, giving a slight bias to 0. NB. The argument is a stemplot in boxes containing integers balance0s=: monad define if. 0<:<./ tf y do. y return. end. if. 0>>./ tf y do. y return. end. z=. 0 i.~ tf y if. 1>:n=.+/0=k=.>(<z,1){y do. y return. end. t=. y m=.<.-:n NB. number of zeros to move t=. (<m}.k) (<z,1)}t t=. ((<m#0) ,~&.>(<(z-1),1){t) (<(z-1),1)}t )

To demonstrate we use data from Wikipedia which is rounded off before it is plotted.

round =: <.@(0.5&+) rpl3=: 4 : 0 NB. 'replace' from jforum 'x0 x1'=. x ((x1,a.) {~ (x0,a.) i. ]) y ) NB. examples taken from Wikipedia entry for stemplot cleanup =: [:round('-_'& rpl3)&.": ]wikidata =: cleanup '-23.678758, -12.45, -3.4, 4.43, 5.5, 5.678, 16.87, 24.7, 56.8' _24 _12 _3 4 6 6 17 25 57

pretty SL wikidata, 0 0 _20 | 4 _10 | 2 _0 | 30 0 | 0466 10 | 7 20 | 5 30 | 40 | 50 | 7

NB. sample data sets a =: 25 64 31 26 20 b =: 1 5 2 3 9 10 3 c =: _3 _1 5 3 9 10 2 19

Note 'demos' SL a SL b SL c SL c, 0 0 SL c, 0 0 0 5 SL c 2 SL c SL wikidata )

I want to acknowledge Keith Smillie's fine work on stem-and-leaf plots from which I
have borrowed extensively.

WikiPedia:Stemplot