Jd/Ops read

From J Wiki
Jump to navigation Jump to search
Jd | Overview | General | Docs | Ops | Admin | Server | Replicate | Guide | Technical | Release | License | Support

key

key [/in] table pairs

return rows in table for pairs.

/in elided returns first row in table that matches each data row and _1 if no match

/in returns all rows (sorted and no _1) in table that match any data row

col types int1/int2/int4/float/varbyte not allowed

see key

read

Same as reads with /lr option

reads

reads [options]   [ SELECT ]   from FROM  [where WHERE ] [order by ORDER BY]

/lr - labeled rows instead of labeled cols
/e  - edate... cols as epochdt rather than iso 8601
/types - type added to col name - cola(edate)
/table foo - result used to create table foo

parsing read ops can be complicated

returns select col(s) from table(s) with rows that satisfy where

alias:col
alias:table
aggregation [by key1[,key2]]

For ideas on working with reads results, see essay Inverted Table.

parsing

Parsing read ops can be complicated. There are known issues that can cause problems. Keep things simple to avoid unknown problems.

Clauses where, order by, jdset, jdtc are stripped from the end and must appear, if present, in that order.

Use blanks before and after keywords.

Anything can appear in WHERE between quotes ("s). This is handled and will not cause parse errors.

Anything can appear in col/table alias. / at start of SELECT will be treated as an option. Keywords (e.g. order by) in alias can cause problems.

Col alias can include blanks, RESERVEDCHARS, and RESERVEDWORDS and this is useful as they are result col labels.

Table alias can include blanks, RESERVEDCHARS, and RESERVEDWORDS but this is not useful and can cause parse problems.

Some problems are avoided as:

RESERVEDWORDS_jd_ are not allowed as col/table names
RESERVEDCHARS_jd_ are not allowed in col/table names

SELECT

read SELECT from ...

Select determines which columns are in the result:

 cola
 alias:cola
 cola,colb
 *
 tabf.cola,tabg.colc
 *.*
  sum cola
 sum cola,avg colb
 sum cola by key1
 sum cola by key1,key2
   jd'info agg' NB. aggregation functions
Select: Column(, Column)*
      | AggCol(, AggCol)*
      | AggCol(, AggCol)* by Column(, Column)*
Column: [alias:] [Tab.]Col
AggCol: [alias:] agg [Tab.]Col
Tab: table  | "*"
Col: column | "*"

Tab is name of a table in the from clause
(alias, name, or the column used to reference it)
root table is used if there is no TAB

Col is name of a column in the corresponding table

alias is a name

agg is an aggregation function

if not an aggregation then * can be used for Tab and Col
if Tab is *, then that column will be shown for all tables
if Col is *, then all columns from the table will be used
Tab expanded first so *.* gives all columns from all tables
If no Columns are given, then *.* is used, or * if only one table

result name for a col is its alias, or how it was identified

by keyword groups rows by the key columns that follow it
each aggregator is called on the rows of its column in each group

result col label does not include agg, so an alias is desirable

FROM

A.B indicates A is joined to B by the A ref col that refers to B

read ... from FROM ...
read ... from A , A.B ...
read ... from A,A.jdref_id_B_id ... NB. distinguish multiple refs
read ... from A , A.B , A.C ...
read ... from A , A.B , B.D ...
read ... from A.B , A ...           NB. order does not matter
read ... from X:A , X.B , X.C ...   NB. X alias used in all subsequent uses
A.B is left1 join
 each A row with 1 matching B row or FILLS
 if multiple matching B rows, the first one is used
 not a standard join, but is fast and adequate in many cases
 same as inner join if 1 and only 1 B row matches
A>B is left join
 same as outer except B rows that did not match are excluded
 same as left1 except all B rows that match are included
A-B is inner join
 same as left except rows with FILLS are excluded
 same as outer except rows with FILLS are excluded
A=B is outer join - this join is not implemented
  all possible rows
  A rows duplicated for each match in B
  B rows duplicated for each match in A
  A or B FILLS provided where no match
A<B is right join - this join is not implemented
  same as outer except A rows that did not match are excluded 

Joins are done in an order based on a sampling that picks representative rows and tests how restrictive the queries are.

See tutorials from and join

WHERE

read ... from ... where WHERE

 a="abc"
 b<23
 b in (23,24,25)
 b lt 23 && a eq "abc"
 a like "ab*" 
 f.a=23 && g.b="def"
 a range (1000,2000)
 a=1 and b=2
 a,b = 1,2          - same as above but uses a,b hash if available
 (a=1 and b=2) or (a=3 and b=4)
 a,b in (1,2),(3,4) - same as above but uses a,b hash if available

relationals:

 >= <= <> =  >  <
 ge le ne eq gt lt range
 in notin like likeci unlike unlikeci sample sample.
 sample n returns n random rows
 sample. uses ?. instead of ?

 non-symbol must be blank delimited

 byte data must be in "s and has C escapes (\" \t \n \0 \x09 ...)
 numeric data must not be in "s

 like/likeci/unlike/unlikeci (ci - case insensitive) regex applies to all types

 relationals have implied parens (which must not be present)
 
 a=c (c not a number and not in "s) is a relation between 2 cols

 t (no col/relational) selects rows from table t not created by join

logicals:

 !   &&  ||
 not and or
  • must be blank delimited
  • logicals evaluate as J (right to left):
  not a=23 && b=24 || b=25  ===  not((a=23)&&((b=24)||(b=25)))

parens can change the order of evaluation

details:

W:  ("!" | "not ")* W1
W1: W2 (" " ("and"|"or"|"&&"|"||") " " W)?
W2: "(" W ")"  |  Q
Q:  col relational dat  |  col relational col  |  tab

col is col name as in select and tab is table name

Q expressions are in implicit parens, which must not be present.

dat represents a J noun; typically this must have the same type as the items of the column, but its type depends on the operation being used. If dat is a string or byte, it must be quoted; otherwise it will be interpreted as numeric data, a column name, or an error.

Strings are in "s and C escapes (\", \n, etc.) can be used.

Logicals (! $$ || not and or) are grouped from right to left as in J, with parentheses allowed for grouping.

The three types of simple query:
- For an operation on a col and data, the data is fixed so that it has the appropriate type and shape. For most operations, this is the same type and shape as an item in the column. For in and notin, its items will have that type and shape. For sample and sample., it will be converted to a single number. See the section on data conversion below. After conversion, the operation will be applied to select qualifying rows in the column.
- For an operation on two columns, the operation will be applied to pairs of rows from the two columns to select qualifying rows. The operations in, notin, like, unlike, sample, and sample. are not available on columns.
- The single table query selects rows that are not null. Null rows are created by some joins and do not correspond to a row in the table. They are indicated with fills.

The relationals:
- (>= <= > < range) are for numeric cols
- range arg is list of numbers, which is split into pairs if an odd number, the last range extends to _ matches are in the range(s) (inclusive)
- in and notin test whether col value is an element of the list
- like and unlike take a regex and test if the col data matches
- sample and sample. require a data argument that is a single number

Data can be converted either to string (or byte) or numbers.
Numbers must include only the characters -_. and digits.
Strings must be quoted to avoid confusion with keywords and col names.

Following escapes are allowed in quoted strings:

\"     double quote
\\     backslash
\a     alert (BEL)
\b     backspace
\e     escape
\f     form feed
\n     new line
\r     carriage return
\t     horizontal tab
\v     vertical tab
\xHH   byte with hexadecimal value HH

Following escapes may be supported but their use is discouraged:

\NNN        byte with octal value NNN (1 to 3 digits)
\xHH        byte with hexadecimal value HH (1 to 2 digits)
\uHHHH      Unicode  (ISO/IEC  10646)  character with hex value HHHH (4 digits)
\UHHHHHHHH  Unicode character with hex value HHHHHHHH (8 digits)

Data is adjusted to the shape of the column. A list of data can be provided by separating items with spaces (this works with numbers or strings). If the data is smaller than the column shape, it will be filled with the fill element corresponding to the column type. If it is larger, an error is signalled.

For in and notin, data is given in a comma-separated list, optionally enclosed by parens.

In Q expression of the form "col = dat" or "col in dat", col can be a comma-separated list of cols. In this case each item in dat is a comma-separated list with the same length as the number of cols. If the relational is "in", each item in dat must be parenthesized. The col values are compared with the items in dat. If there is a hash columns it will be used for the query.

ORDER BY

One or more cols in the selection, separated by commas, used to sort the final result. Ascending by default. Add desc at end for descending. If the selection col has an alias, then the alias is used.