Puzzles/Word Frequencies

From J Wiki
Jump to: navigation, search

Given a list of words, find the top m most frequent words and the corresponding frequencies.

Solution

The dyad x u/.y key is useful for such problems. It applies u to items of y that have the same keys as indicated by items of x . For example:

# /.~y             NB. the word frequencies correponding to ~.y
{./.~y             NB. the unique words, i.e. ~.y
({. , <@#)/.~ y    NB. the unique words and the corresponding frequencies

For the actual problem, we will use y (#,{.)/. i.#y , which gives a 2-column table of the frequencies and indices.

wordfreq=: 4 : 0
 'c i'=. |: x. {. \:~ y (#,{.)/. i.#y
 (i{y) ,. <"0 c
)

For example:

sample=: 3 : 0
 a=. 'abcdefghijklmnopqrstuvwxyz'
 c=. 3 5 7 9
 n=. 10^>.-:c
 x=. ; <"1&.> (>.1e4%n)#&.> (n,&.>c) (a {~ ?@$)&.> #a
 x {~ y ?@$ #x
)

   x=: sample 1e6
   $ x
1000000
   8 {. x
┌───────┬─────────┬─────────┬─────────┬─────────┬─────────┬───┬─────┐
│wghgnkv│xaubfuowg│vlqwuvaji│viajpaaih│qcbamjdfh│dftavyazm│sjj│qjtws│
└───────┴─────────┴─────────┴─────────┴─────────┴─────────┴───┴─────┘

   10 wordfreq x
┌───┬───┐
│sfn│832│
├───┼───┤
│bgp│819│
├───┼───┤
│yhg│818│
├───┼───┤
│abd│815│
├───┼───┤
│ctz│814│
├───┼───┤
│wkt│813│
├───┼───┤
│eim│810│
├───┼───┤
│ovd│808│
├───┼───┤
│rix│807│
├───┼───┤
│yrc│806│
└───┴───┘



Contributed by Roger Hui.