JDB/Aggregation/Prototype

From J Wiki
Jump to: navigation, search
JDB: Layout Columns | Queries Σ | API: Structural Records | Client/Server | Implementation | Examples

Here's an Aggregation prototype based on notes above used in pre-release stages of JDB. By JDB release, aggregation is integrated into query syntax. This approach still can be used as a template for extending further features, such as expressions and functions.

Wm yes check.png

First load jdb using a few utilities from strings.

require 'data/jdb strings'
coinsert 'jdb'

We separate Select Clause parser as nested splitting over by, comma and alias.

NB. parse 'alias:func col,... by alias:col,...'
NB. return lists: ra rf rc ba bc
sel_parse=: 3 : 0
  t=. 2{.' by ' splitstring y
  c=. (<',') splitstring^:(0<#@]) each t
  'r b'=. (<':') (_2 {. splitstring)^:(0<#@]) every L:1 c
  'ra re'=. <"1 |:r
  'rf rc'=. <"1 |:(<' ') (_2 {. splitstring)^:(0<#@]) every re
  'ba bc'=. <"1 |:b
  ra ; rf ; rc ; ba ,&<  bc
)

Note 'Test sel_parse'
  hd=. (;:'ra rf rc ba bc') ,: ]
  hd sel_parse 'a,b,c'
  hd sel_parse 'a,b,c by qq,zz'
  hd sel_parse 'q:a,b,x:c'
  hd sel_parse 'a,q:b,c by qq,z:zz'
  hd sel_parse 'sum a,q:avg b,min c by qq,z:zz'
)

Parsing test results gives the idea:

     hd sel_parse 'a,q:b,c by qq,z:zz'
+-----+----+-------+----+-------+
|ra   |rf  |rc     |ba  |bc     |
+-----+----+-------+----+-------+
|++-++|++++|+-+-+-+|++-+|+--+--+|
|||q|||||||||a|b|c||||z|||qq|zz||
|++-++|++++|+-+-+-+|++-+|+--+--+|
+-----+----+-------+----+-------+
     hd sel_parse 'sum a,q:avg b,min c by qq,z:zz'
+-----+-------------+-------+----+-------+
|ra   |rf           |rc     |ba  |bc     |
+-----+-------------+-------+----+-------+
|++-++|+---+---+---+|+-+-+-+|++-+|+--+--+|
|||q||||sum|avg|min|||a|b|c||||z|||qq|zz||
|++-++|+---+---+---+|+-+-+-+|++-+|+--+--+|
+-----+-------------+-------+----+-------+

Next we define aggregator operations and a helper function.

The Reada verb follows the algorithm in Aggregation notes.

'`sum avg min max count first last'=: (+/)`(+/%#)`(<./)`(>./)`#`{.`{:
ovr=: [^:(0 < #@[) each   NB. x if nonempty overrides y

Reada=: 4 : 0
  if. 1=#y=. boxopen y do. Reads__x y return. end.
  'tbl wh sl'=. 3{.y                  NB. where and select clauses
  if. -.' by ' +./@E. sl do. Reads__x y return. end.

  'ra rf rc ba bc'=. sel_parse sl     NB. parse select clause
  q=. (','joinstring bc,rc),' from ',tbl,(0<#wh)#' where ',wh
  s=. {:Read__x q                     NB. fetch filtered base columns
  c=. i.~|:i.~every (#bc){.s          NB. classi-map from by-cols
  ff=. rf,~(#bc)${.`{.                NB. aggregs with {. for by-cols
  r=. ''
  for_i. i.#bc,rc do.
    r=. r,< c ff@.i/. i {::s          NB. apply aggregs over c
  end.
  ((ba ovr bc),ra ovr rc),:tocolumn r NB. columnar format
)

Note 'sandp'
  buildsandp_jdb_ F=. jpath '~temp'
  d=. opensandp_jdb_ F
  'j p s sp spj'=: GetTable__d 'j p s sp spj'
  Reads__d 'from j'
  Reads__d 'from p'
  Reads__d'from s'
  Reads__d '** from sp where s.city=p.city'
  Reads__d '** from spj where color=Blue'

  d Reada 'j';'';'job_cnt:count jname by city'
  d Reada 'p';'city<>Paris';'sum weight by color'
  d Reada 'p';'';'sum weight by color,city'
  d Reada 'p';'';'avg weight by city,color'
  d Reada 'p';'';'avg weight,pone:first pid by where:city,color'
  d Reada 's';'';'min_st:min status by city'
  d Reada 'sp';'';'sum weight,min status by supplier:s.city'
  d Reada 'sp';'';'sum weight by supp:s.city,part:p.city'
  d Reada 'spj';'';'avg_qty:avg qty by supp:sname,part:p.city'
  d Reada 'spj';'';'avg_qty:avg qty by part_col:p.color'

  resetall_jdb_''
)

Example output:

     j Reada '*';'job_cnt:count jname by city'
+------+-------+
|city  |job_cnt|
+------+-------+
|Paris |1      |
|Rome  |1      |
|Athens|2      |
|London|2      |
|Oslo  |1      |
+------+-------+
     p Reada 'city<>Paris';'sum weight by color'
+-----+------+
|color|weight|
+-----+------+
|Red  |45    |
|Blue |17    |
+-----+------+
     p Reada '*';'sum weight by color,city'
+-----+------+------+
|color|city  |weight|
+-----+------+------+
|Red  |London|45    |
|Green|Paris |17    |
|Blue |Oslo  |17    |
|Blue |Paris |12    |
+-----+------+------+
     p Reada '*';'avg weight by city,color'
+------+-----+------+
|city  |color|weight|
+------+-----+------+
|London|Red  |15    |
|Paris |Green|17    |
|Oslo  |Blue |17    |
|Paris |Blue |12    |
+------+-----+------+
     p Reada '*';'avg weight,pone:first pid by where:city,color'
+------+-----+------+----+
|where |color|weight|pone|
+------+-----+------+----+
|London|Red  |15    |P1  |
|Paris |Green|17    |P2  |
|Oslo  |Blue |17    |P3  |
|Paris |Blue |12    |P5  |
+------+-----+------+----+
     s Reada '*';'min_st:min status by city'
+------+------+
|city  |min_st|
+------+------+
|London|20    |
|Paris |10    |
|Athens|30    |
+------+------+
     sp Reada '*';'sum weight,min status by supplier:s.city'
+--------+------+------+
|supplier|weight|status|
+--------+------+------+
|London  |134   |20    |
|Paris   | 46   |10    |
+--------+------+------+
     sp Reada '*';'sum weight by supp:s.city,part:p.city'
+------+------+------+
|supp  |part  |weight|
+------+------+------+
|London|London|59    |
|London|Paris |58    |
|London|Oslo  |17    |
|Paris |London|12    |
|Paris |Paris |34    |
+------+------+------+
     spj Reada '*';'avg_qty:avg qty by supp:sname,part:p.city'
+-----+------+-------+
|supp |part  |avg_qty|
+-----+------+-------+
|Smith|London|    450|
|Jones|Oslo  |442.857|
|Jones|Paris |    100|
|Blake|Oslo  |    200|
|Blake|London|    500|
|Clark|London|    300|
|Adams|Paris |    260|
|Adams|London|    400|
|Adams|Oslo  |    200|
+-----+------+-------+
     spj Reada '*';'avg_qty:avg qty by part_col:p.color'
+--------+-------+
|part_col|avg_qty|
+--------+-------+
|Red     |    400|
|Blue    |353.846|
|Green   |    150|
+--------+-------+