Jd/Technical

From J Wiki
Jump to: navigation, search
Jd | Overview | General | Ops | Admin | Guide | Technical | Release | License | Support

Performance

measurement

See tutorial performance and see pmhelp_jd_ for info. Folder pm has some rough scripts for performance measurement.

columnar

Jd performance is due to columnar data. A data column (e.g., name or date or license) is a disk file that is mapped to a J noun. A query on a data column only needs the column data in ram to do the query. If query columns fit in ram then queries run at ram speed. Then only data required for the result is read from disk and typically this is a small fraction of the total database size.

ram

Ram is the most critical factor in performance. In general, performance will be good if available ram is more than 2 times the space required by the cols typically used in a query. Working with data in ram is orders of magnitude faster than working with data that has to come from disk.

ssd

Working with data from an ssd is orders of magnitude faster than working with data that has to come from an non-ssd. For a serious application there is no good reason to not use ssd.

intx

Use the smallest intx type that will hold the data. This will reduce overall database size and will make better use of ram.

partitioned table

A partitioned table has column data in multiple files. For the user, a partitioned table is the same as any other table, but it can make a signficant performance difference. High performance queries/inserts/updates/modifies can be achieved on tables with billions of rows on modest hardware if they are partitioned.

allocation across drives

Column data files can be located, under your control, on different drives. For example, columns critical for query peformance could be on an ssd drive and the rest of the columns could be stored on normal drives.

This control over drive allocation also works for partitioned tables. For example, column data files for recent dates could be stored on ssd and files for the rest of the data could be stored on normal drives.

Folder symbolic links (Windows folder junctions) are used to place db cols on different drives.

See tutorial link.

Drop/Delete

A database structure maps directly to a folder structure. A database is a folder, a table is a folder in a database, and a column is a folder in a table.

A db/table/col drop cannot be undone. It would be unfortunate to inadvertently drop something that was hard to recover.

Restrictions while building a database can be a nuisance, but when things are stable it can be nice to disallow drops. This can be done with jdaccess but that is perhaps more mechanism than warranted.

jddropstop provides an easy way to prevent bad drops.

A db/table/col drop uses the utility jddeletefolder. This utility is also used in other admin activities, for example, deleting folders of csv files that have been processed.

A jddeletefolder cannot be undone. It would be unfortunate to inadvertently delete something that was hard to recover.

jddeletefolder allows delete only if certain criteria are met and this can prevent an unintended delete.

locales and db file structure

Parts of a database (tables, cols, data) correspond directly with the file structure. That is, a table is a folder in the database, each col is a folder in its table folder, and data is a file in its col folder.

When a database is opened, J locales are created that correspond to the database structure. Each table has a locale with metadata and each col has a locale with metadata and mapped file(s) with the data.

Sometimes it can be useful to dig into the internals.

   jdadminx'test'
   jd'gen test f 3'
   jd'reads from f'
   t=. jdgl_jd_'f'   NB. get locale for table f
   NAME__t           NB. table name
   NAMES__t          NB. col names in table
   c=. jdgl_jd_'f x' NB. get locale for col x in table f
   typ__c            NB. column type
   PATH__c           NB. path to col dat file
   dat__c            NB. mapped file data

backup

Complete backup or restore is just a copy of the db file folder. Host shell scripts can provide full backup/restore. With large databases and suitable hardware it might be worthwhile to use multiple tasks and use compression.

CSV dump/restore also provides complete backup.

file handles

Jd requires lots of file handles. Using thousands of columns requires thousands of handles.

Jd fails badly if it runs out of handles. Unable to access a file, an error is signaled, perhaps in the middle of an operation that will leave the database damaged.

Windows user does not have a limit on file handles.

Linux/Mac user often has low soft and hard limits on handles and this must be increased for serious use of Jd. There is no reason to not raise the limit to 100000.

See the soft and hard limits with:

...$ ulimit -n

If hard limit is high enough, it might be easiest, before starting J, to do:

...$ ulimit -n 100000

To increase file handle limit for Linux Jd user fred: ...$ ulimit -n # show current file handle limit

run superuser text editor and open /etc/security/limits.conf
add following 2 lines at the end

fred soft nofile 200000
fred hard nofile 200000

save the file, restart system, and verify new ulimit

To increase file handle limit for Mac the steps are similar, but of course different, and details are left to the reader. Yosemite has a low soft limit and a high hard limit.

developer

Jd is distributed with JAL (Package Manager) and the Jd library is at ~addons/data/jd and is accessed with the following equivalent lines:

   load'jd'
   load'~addons/data/jd/jd.ijs'

A developer works with a local repo. Use the development library with something like:

   load'~/dev/addons/data/jd/jd.ijs'

Loading jd.ijs sets JDP_z_ as the path to the Jd library and this is used for all library references.

An automated process copies the developer repo to the addon svn repo to build a new Jd release.

libjd.so

Jd linux libjd.so shared library will run on most modern linux systems.

If Jd gets an error loading the linux shared library, please report the following to the J database forum:

...$ cat /proc/version
...$ cat /etc/issue
...$ ldd .../libjd.so

Windows search service

Windows Search Service (content indexing, ...) can cause lots of disk activity and can interfere with Jd file operations and if possible should be disabled when using Jd.

Disable Windows Search Service as follows:

  1. command prompt ...>services.msc
  2. scroll down and right click Windows Search
  3. click Properties
  4. click Stop button to stop service if it is running
  5. change Startup type: to Disabled
  6. click Apply

convert Jd3->Jd4

Jd4 is incompatible with databases created under previous versions. Jd4 code will not open a Jd3 database and vice versa.

To continue working with databases created with Jd3 you must:

  • rename ~addons/data/jd to ~addons/data/jd3

And then, when you want to work with a Jd3 database:

  • load'~addons/data/jd3/jd.ijs'

An update will overwrite addons/data/jd with the Jd4 codebase. After that point, update can't give you a jd3 folder. You can create it by downloading the appropriate file from jd3 zips and unpacking in a temp folder, renaming jd to jd3 and then moving it to your addons folder.

See Release for change details.

The Jd4 has conversion tools to migrate a Jd3 database to a Jd4. The conversion is done in place and only works in the one direction.

Conversion:

start J
load 'jd'
jd 'list version' NB. verify 4.1
load '~addons/data/jd/tools/convert.ijs'
dryrun '~temp/jd/test' NB. dry run this db and report details
   
DO NOT conrun UNTIL dryrun IS CLEAN!
BACKUP!
  
conrun '~temp/jd/test' NB. convert this db in place and report details

Conversion summary:

  • jdactive col dropped and deleted rows compressed out
  • jd... cols (ref/reference/hash/...) dropped
  • cols with type time or enum dropped
  • jd'ref ...' done for jdref and jdreference cols
  • dryrun failure leaves db untouched
  • conrun failure probably leaves db damaged - neither jd3 nor jd4 compatible