JwithStrings/Chopped Dates

From J Wiki
Jump to navigation Jump to search

J with Strings: Chopped Dates

Back to: Table of Contents

If J is running on a Unix-based system, like the Macintosh, the following will return the weekday, day and time:

   ] Today=: }: 2!:0 'date'
Thu Sep 22 21:31:13 BST 2011

If your copy of J isn't running under Unix, then to follow the examples just assign this string to noun Today:

   Today=: 'Thu Sep 22 21:31:13 BST 2011'

Suppose you want to see this string in the following format:

Thu Sep 22 2011 21:31:13

Now JAL has a scheme for returning a timestamp in just about any format you like. We will come to that. Programmer are encouraged to use built-in facilities where available, because these generally adjust themselves to the country concerned, not to say new platform releases, without the programmer having to worry. So we are not seriously suggesting what follows should be used in code meant for distribution (except in beta). Nevertheless timestamps furnish a handy rusk to chew on.

The first thing to learn when faced with a new string is how to butcher it into manageable steaks, which you can shuffle around and fry at your leisure. Try this:

   ] z=: cut Today
┌───┬───┬──┬────────┬───┬─────┐
│Thu│Sep│22│21:35:12│BST│2011 │
└───┴───┴──┴────────┴───┴─────┘
   ' ' cut Today
┌───┬───┬──┬────────┬───┬─────┐
│Thu│Sep│22│21:35:12│BST│2011 │
└───┴───┴──┴────────┴───┴─────┘
   ':' cut Today
┌─────────────┬──┬────────────┐
│Thu Sep 22 21│35│12 BST 2011 │
└─────────────┴──┴────────────┘

You can see what's possible. Noun z is a vector, the atoms of which are boxed. It can be relied on to contain 6 such atoms. That is, if we can rely on the format of Unix: date. So we can extract the atoms we want, in the order we want, using From ({).

   1{z
┌───┐
│Sep│
└───┘
   2{z
┌──┐
│22│
└──┘
   5{z
┌─────┐
│2011 │
└─────┘
   1 2 5 {z	NB. common within the USA
┌───┬──┬─────┐
│Sep│22│2011 │
└───┴──┴─────┘
   2 1 5{z	NB. common within the UK
┌──┬───┬─────┐
│22│Sep│2011 │
└──┴───┴─────┘
   ; 1 2 5 {z	NB. Link the atoms up again
Sep222011
   ; 1 6 2 6 5 { z,<' '	NB. --with spaces, this time
Sep 22 2011

Explain: "chop"

For now, think of an atom (the J term for an element of a vector, namely a letter if it's a string) as either a number, a character or boxed. What's inside doesn't figure until you come to open the box, which you do with Open (>).

Aside: beginners often forget to open a boxed atom.

   3{z
┌────────┐
│21:35:12│
└────────┘
   ':' cut 3{z		NB. Won't work because 3{z is a boxed atom not a string
|domain error: cut
|   ':'    cut 3{z
   >3{z				NB. Open the box first
21:35:12
   ':' cut >3{z
┌──┬──┬──┐
│21│35│12│
└──┴──┴──┘
   ;".each ':' cut >3{z		NB. --or as numbers, not numerals
21 35 12

Verb cut is a handy verb to know. As we see above, its default x-argument is Space (' ') so it will chop the string at the spaces. But if you give it an x-argument like Colon (':') it will instead chop the string at the colons.

Let's keep to spaces. What if there might be multiple spaces? You can use deb ("delete-extra-blanks") to make sure there are no two consecutive spaces. Like so:

   cut deb Today
┌───┬───┬──┬────────┬───┬─────┐
│Thu│Sep│22│21:35:12│BST│2011 │
└───┴───┴──┴────────┴───┴─────┘

But in this case it doesn't matter. Not only because we choose to rely on Unix's date format being rock-solid and not slipping in an extra space, but because cut ignores repeated spaces. See this:

   cut 'alpha       bravo'
┌─────┬─────┐
│alpha│bravo│
└─────┴─────┘

WARNING: This has consequences when you're chopping up comma-separated data, eg CSV format. See this:

   ',' cut 'alpha,bravo,,delta'
┌─────┬─────┬─────┐
│alpha│bravo│delta│
└─────┴─────┴─────┘

In CSV, as well as C-language syntax, repeated commas are placeholders for missing values. But cut manages to lose the evidence. Fortunately there are other ways to chop up a string which preserve placeholders of this nature. Such as Words (;:), which we shall examine shortly.

   ;: 'alpha,bravo,,delta'
┌─────┬─┬─────┬─┬─┬─────┐
│alpha│,│bravo│,│,│delta│
└─────┴─┴─────┴─┴─┴─────┘

Back to our timestamp, here's a verb definition which delivers the Unix date in a more congenial form (you will now be able to see how to customise it):

3 : 0
	NB. get date from Unix shell
z=: (<' '), ;: 2!:0 'date'
	NB. z has the fixed form:
	NB. ┌─┬───┬───┬──┬───┬───┬──┬───┬────┬─┐
	NB. │ │Mon│Sep│19│23:│05:│51│BST│2011│ │
	NB. └─┴───┴───┴──┴───┴───┴──┴───┴────┴─┘
	NB.  0 1   2   3  4   5   6  7   8
NB. d=. ; 1 0 2 0 3 0 8 {z	NB. USA format
d=. ; 1 0 3 0 2 0 8 {z		NB. UK format
t=. ; 4 5 6 {z
d,' ',t
)

NOTE: We stick a boxed space on the front of z, not the back as earlier. And by using Words (;:) we chop up the time as well as the date.


What do we mean by "chop"?

We've been using the word chop to describe converting a string to a list of boxed atoms, each atom consisting of a boxed substring, especially a word.

Algebraists would call the list got from chopping a string a partition of the string.

A partition consists of parts, each part being in this case an atom of the list, namely a boxed substring.

J-ers say "cut", as in the name of the library verb cut. But this is J-argon: when I think of "cutting" a string, I think of losing a piece off the beginning or end. Not turning a list of characters into a list of words. J-ers say "take" or "drop" for this, and there are primitives with these names.

Take ({.) and Drop (}.) accept a numeric x-argument: the number of atoms to take (or drop) from the y-argument: the list (eg string) in question. But in most applications you need to compute the number to use. But there are two useful library verbs: taketo and dropto which compute the number for you.

insert example here

Although easy to understand, this is a bit laborious. J, being J, has a shorter way of doing that:

'day mon dayno hour min sec zone year'=: ;:Today NB. the assigned nouns are UNBOXED

but then of course you have to process each (string) numeral individually to convert it to a number, which does at least give you the opportunity to remove trailing colons from hour and min.


Using Words (;:) to chop a string

TEXT


What to do with the parts of the string

However you chop (partition) a string, you then have to consider what sort of part you end up with:

  • Is it boxed or unboxed?
  • Does it include the separator?
  • In particular, does it have leading or trailing spaces?
  • If numeric data, is it a numeral, a (scalar) number, or a vector?

J is systematic about this, but as a beginner you may find the "system" hard to grasp and certainly hard to remember. It's probably the hardest part of string-processing in J, and all the techniques we examine come up against these questions in one form or other. With experience you won't see what's dificult about it. But at this stage simply remember to test what you've got and convert accordingly.


-- Ian Clark <<DateTime(2012-08-25T09:25:12Z)>>