# User:Brian Schott/Stemplot

## Stemplots

For many data sets (with modest cardinality) a stemplot is a simple and clear alternative to a histogram Essays/Histogram.
WikiPedia:Stemplot

```   stem =: <.@(%&10) :. (10&*)  NB. generalized in stemGen below
sort=: /:~
sortleaf=: |@/:~ NB. sort leaf
stem =: <.@:(%&10) :(<.@: %~ )
leaf=: (* * 10&|@|)@]
stemNub=: (10 * ~.@:stem) : ([ * ~.@:stem)
SLtab=: stemNub ;"0  stem sortleaf each@</. leaf

]sample =: 20?.@\$ 20
6 15 19 12 14 19 0 17 0 14 6 18 13 18 11 12 18 0 10 2
SLtab sortleaf sample
+--+---------------------------+
|0 |0 0 0 2 6 6                |
+--+---------------------------+
|10|0 1 2 2 3 4 4 5 7 8 8 8 9 9|
+--+---------------------------+
```

The stemplot using SLtab shows that 6 of the random integers are between 0 and 9, the rest are between 10 and 19.

A slightly more attractive readout is achieved by the verb pretty. Also notice that SLtab is dyadic and can take 2, 5, or 10 as its left argument.

```   pretty =: (_5&{.@":@[,' | ',(1j0&":)@])&>/"1
pretty SLtab sort sample
0 | 000266
10 | 01223445788899

pretty 5 SLtab sort sample
0 | 0002
5 | 66
10 | 0122344
15 | 5788899
```

But as can be seen by the next data set taken from Wikipedia, gaps in the data can yield gaps in the stemplot (stems for 50 and 90 are missing).

```   pretty SLtab 44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106
40 | 4679
60 | 34688
70 | 2256
80 | 148
100 | 6
```

Additional features in SL eschew the gaps and deal with negative values, and distributing 0s in data when needed.

```   d =: {: - {.
rmonad=: 10 * ({: |.@:- i.@>:@d)  @stem
rdyad=: [([ * ({: |.@:- i.@>:@d@])@stem)]
tf =: >@{."1@]            NB. take first and open
df =: }."1@]              NB. drop first
fsg=: <"0@([r tf) ,. ,.@(([(r e. ])tf) expand&, df) NB. fill stem gaps
SLgapless=: 10&\$: : ([ fsg  sort @ SLtab)
neg0=: [(":@]`('_'&,@":@|@:+)@.(0>])"0) tf          NB. recalc neg stems
stemClean=: 10&\$: : (<"1@neg0,.df)
SL=: 10&\$: :( [ stemClean balance0s@SLgapless)

pretty SL 44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106
40 | 4679
50 |
60 | 34688
70 | 2256
80 | 148
90 |
100 | 6
```

Also notice that SL is dyadic and can take 2, 5, or 10 as its left argument. Any zeroes in data that also contains both positive and negative data values is problematic because two zero stems are required: one positive and one negative (yes, _0). Furthermore, a decision regarding to which stem each zero value is assigned, must be made. Here we distribute an even number of zeroes equally between the two stems and the positive stem is favored if an odd number of zero data values is provided.

```NB.* balance0s v monad
NB. When there are negative data and multiple values of 0
NB.    in the data, then the 0's need to be distributed
NB.    between the two stems 0 and _0. This verb does
NB.    that distribution, giving a slight bias to 0.
NB. The argument is a stemplot in boxes containing integers
if. 0<:<./ tf y do. y return. end.
if. 0>>./ tf y do. y return. end.
z=. 0 i.~ tf y
if. 1>:n=.+/0=k=.>(<z,1){y do. y return. end.
t=. y
m=.<.-:n  NB. number of zeros to move
t=. (<m}.k) (<z,1)}t
t=. ((<m#0) ,~&.>(<(z-1),1){t) (<(z-1),1)}t
)
```

To demonstrate we use data from Wikipedia which is rounded off before it is plotted.

```   round =: <.@(0.5&+)
rpl3=: 4 : 0   NB. 'replace' from jforum
'x0 x1'=. x
((x1,a.) {~ (x0,a.) i. ]) y
)

NB. examples taken from Wikipedia entry for stemplot
cleanup =: [:round('-_'& rpl3)&.":

]wikidata =: cleanup '-23.678758, -12.45, -3.4, 4.43, 5.5, 5.678, 16.87, 24.7, 56.8'
_24 _12 _3 4 6 6 17 25 57
```

```   pretty SL wikidata, 0 0
_20 | 4
_10 | 2
_0  | 30
0   | 0466
10  | 7
20  | 5
30  |
40  |
50  | 7
```

```NB. sample data sets
a =: 25 64 31 26 20
b =: 1 5 2 3 9 10 3
c =: _3 _1 5 3 9 10 2 19
```

```Note 'demos'
SL a
SL b
SL c
SL c, 0 0
SL c, 0 0 0
5 SL c
2 SL c
SL wikidata
)
```

I want to acknowledge Keith Smillie's fine work on stem-and-leaf plots from which I have borrowed extensively.
WikiPedia:Stemplot