# User:Brian Schott/Histogram

The purpose of this essay is to expand on Roger Hui's Essays/Histogram especially how (relative) histograms of empirical data are used in statistical circles to capture the nature of theoretical probability distributions. The three main points made here are

a. to contrast Hui's verb, *histogram*, which collects its frequency interval counts with the dyadic verb `I.` (*Idot*), with the interval counts in statistical histograms,

a. to align properly the interval labels and their frequency counts, and

a. to enable the construction of a statistical histogram with rescaled frequency counts when the intervals are *not* of uniform widths.

I. collects frequency counts based on intervals which are open on the left and closed on the right: (x_i-1,x_i]. Statistical intervals reverse this pattern: [x_i-1,x_i). This problem is dealt with later by reversing the input to *Idot*, after dealing with the alignment problem, although the reversal problem still exists here, and is just ignored.

histogram =: <: @ (#/.~) @ (i. @#@[ , I.) histogram1=: <: @ (#/.~) @ (i.@>:@#@[ , I.) test1 =: dyad define assert. ({:x)>:>./y assert. ({.x)<<./y )

Compare the only plot in Hui's essay and the first plot here in the vicinity of e=100. In Hui's plot, the flat peak of the histogram is to the right of e=100. Here the flat peak is centered around f=100. To accomplish this *histogram1* is based on realigned intervals: one fewer interval boundary is input to *histogram1*, but one additional interval boundary is created by *histogram1*. One final adjustment must be performed in the case of a continuous variable like the one Hui uses for example data: adjacent interval boundaries are averaged before plotting so the horizontal tick marks align correctly with each interval's center.

d=: +/ 10 1e6 ?.@$ 21 e=: 5 * i.40 f=: }.}: e h =: e histogram d h1=: f histogram1 d e (histogram-:histogram1) d NB. should NOT be equal 0 h1 -: }.h NB. should be equal 1 f test1 d ff=: 2+/\-:e load 'plot' plot ff;h1

A more traditional statistical histogram is afforded by *histogram2* which employs *Idotr*; uneven interval widths are handled by these revisions with *relative* and *drawplot*.

The verb *Idotr* defined here reverses the application of I. and also adjusts the interval boundaries in the verb *histogram2*. However, alone *histogram2* cannot cope with unequal interval widths. The final example creates unequal interval widths by eliminating the interval boundaries 45 and 50. The verbs *relative* and *drawplot* treat the Hui example as if it were to be modeled by a discrete probability mass model, instead of the continuous Gaussian density function. In such cases, the **area** of each interval is proportional to the interval's relative frequency, rather than the **vertical heights** being proportional to the relative frequency. The final plot shows the desired result.

histogram =: <: @ (#/.~) @ (i. @#@[ , I.) histogram1=: <: @ (#/.~) @ (i.@>:@#@[ , I.) histogram2=: <: @ (#/.~) @ (i.@>:@#@[ , Idotr) Idotr =: |.@[ (#@[-I.) ] relative =: ((2 -~/\ [) %~ }.@}:@histogram2) % #@] drawplot =: 2&#@[ ; _1&|.@(2&#)@(,&0)@] test1 =: dyad define assert. ({:x)>:>./y assert. ({.x)<<./y ) test2 =: dyad define assert. ({:x)>>./y assert. ({.x)<:<./y )

d=: +/ 10 1e6 ?.@$ 21 e=: 5 * i.40 e2 =: e -. 45 50 e2 test2 d 'ycaption relative frequency'plot e2 ([ drawplot relative ) d