Scripts/OLEExcel

From J Wiki
Jump to navigation Jump to search

Abstract

Overview of a script that uses Microsoft OLE to read and write Excel spreadsheets as well as run macros. Also gives examples of adding a sheet to a workbook. These routines may rely on "xlutils.ijs" but this has been dropped from the more recent versions (>6?) of J, so can be found here.

Brief Introduction

This code for reading an Excel spreadsheet, e.g.

   ssdata=. extractXLData_excel_ 'C:\amisc\spsheet.xls'

returns the contents of all worksheets as an N x 3 matrix where "N" is the number of worksheets. The first column contains names of worksheets; the second column gives the range of the data in Excel notation; the third column contains the data.

Code to write to an Excel spreadsheet, e.g.

   writeLines2WB_excel_ ('11/15/2006';'IBM';93.11);18 1;'C:\amisc\spsheet.xls';1

writes the three items, starting at the 18th row (origin 1) and 1st column of the spreadsheet named, in worksheet number one. As the name suggests, this also allows us to write multiple lines at a time - you might even think of it as writing a matrix if you're inclined to that sort of perversion. So, you might write these three tables to the specified spreadsheet this way:

   mat1=. i. 3 4
   mat2=. 'Now is the time','for all good people','to come to the aid',:'of their world.'
   mat3=. (":i. 2 3);<3 2$'ehtoat'
   ssht=. 'C:\amisc\J\Virgin.xls'

   writeLines2WB_excel_ mat1;1 1;ssht;1
   writeLines2WB_excel_ mat2;4 5;ssht;1
   writeLines2WB_excel_ mat3;4 2;ssht;'Sheet1'

This second write gives a message like this:

WriteWarning.png

- not sure why - but clicking "OK" allows it to proceed. Note the first two writes use the index of the sheet but the third uses the name. Once you've done the three inserts shown above, here's

EgSpreadsheetWriteOutcome.png

what you might see in the spreadsheet on the first tab.

It's likely that we might write a table that mixes character strings and numbers, E.G.:

   ]pfol=: <;._1 &> TAB,&.>a:-.~<;._1 LF,0 : 0
Symbol	Price	Volume	Shrs	Price Paid	Gain
ALOG	61.5	51,773	-300	59.51	-$597.00
AMPH	17.25	2,200	-1	15.9	-$1.35
ARBX	6.48	42,371	1,200	5.13	$1,620.00
^OEX	656.34	0	1	663.38	-$7.04
VIXEM.X	2.9	120	10	3.8	-$900.00
VIXEC.X	1.15	1,155	10	3.6	-$2,450.00
)
+-------+------+------+-----+----------+----------+
|Symbol |Price |Volume|Shrs |Price Paid|Gain      |
+-------+------+------+-----+----------+----------+
|ALOG   |61.5  |51,773|-300 |59.51     |-$597.00  |
+-------+------+------+-----+----------+----------+
|AMPH   |17.25 |2,200 |-1   |15.9      |-$1.35    |
+-------+------+------+-----+----------+----------+
|ARBX   |6.48  |42,371|1,200|5.13      |$1,620.00 |
+-------+------+------+-----+----------+----------+
|^OEX   |656.34|0     |1    |663.38    |-$7.04    |
+-------+------+------+-----+----------+----------+
|VIXEM.X|2.9   |120   |10   |3.8       |-$900.00  |
+-------+------+------+-----+----------+----------+
|VIXEC.X|1.15  |1,155 |10   |3.6       |-$2,450.00|
+-------+------+------+-----+----------+----------+
NB. Write portfolio, starting cell at row 1, column 2, in sheet 3
   writeLine2WB_excel_ pfol;1 2;'C:\amisc\spsheet.xls';3

Note that this treats all the numbers as character strings, not as numbers. Read the following warnings (in "Caveats..." below about some potential pitfalls in dealing with numeric values.

These functions, and supporting ones, are in File:OLEExcel.ijs.

Caveats About Ugliness and Other Things

This code to read and write Excel spreadsheets using the OLE facility in J is kind of warty because Excel is warty. That is, there are a number of ugly, arbitrary things I have to do, like translate to and from Excel notation, account for how much data the clipboard can hold, ensure spreadsheet paths include a disk specification, and account for a difference in size between what was requested to be read and what was actually read from a worksheet.

There is a big potential problem with reading numbers from Excel: the OLE method returns numbers as formatted in the spreadsheet. This has a couple of unfortunate consequences: it limits precision and can miss negative numbers. This latter case could happen because Excel allows you to specify that negative numbers are indicated solely by color, e.g. all negative numbers are red. This means you will read them as a positive number using OLE - there is no way around this as far as I know.

Another, solvable problem has to do with how the representation of numbers differs between J and Excel, particularly in that J uses the underscore character "_" as a negative sign whereas Excel has a number of different conventions depending on the formatting of the cell. This means generalized code has to convert between "_" (J minus) and "-", or "(number)" for negatives.

Furthermore, Excel has some other conventions for representing numbers that have to be translated to J. For instance, Excel allows the plus sign "+" in an exponent whereas J does not. Also, miscellaneous decorators like "$" and embedded commas have to be removed from Excel numbers for J to be able to convert them to numbers.

One other caveat: these functions work only with implementations of J which support "wd". So, they will not work from J invoked by "jconsole.exe", only as "j.exe". A sign that your version of J might be something like the following:

   xlopen ''
|domain error: wd
|       wd'cc xl oleautomation:excel.application'

Pre-requisites

First, you need to run this code from within "j.exe" as "jconsole.exe" does not support "wd".

To run these routines, you should have a "jmacros.xls" file defined to hold the basic macros used by this code. If you don't, you may see a message like this: EgMissingJMacrosXLS.png

These macros are VBA routines to perform basic I/O and such. They can be found here. In Excel 2010, you may need to invoke File/Options/Customize Ribbon and check the "Developer" box to give you easy access to the VB script editor in order to insert these macros.

Brief Explanation of Code

I'll assume the reader has already looked at the usual OLE/Excel intro.

I'll present the code with little explanation other than the comments embedded in it until I spend more time updating this page. Of course, anyone who finds this useful is also welcome to update the page with anything they've found.

One small exclusion: in order to avoid loading my own set of date/time functions, I've commented out the line assigning "XLDF" for now.

Code Introduction and Table-of-contents: global variables and functions in order

As a standard practice, I list the functions and global variables defined in my script file with a one-line description of the purpose of each at the top of the file.

NB.* OLEExcel.ijs: functions to work with MS Excel via OLE.

NB.* XLDF: Excel date fudge to adjust Excel day number to Julian.
NB.* CLIPMAX: Approx. max. FP nums clipboard holds
NB.* getWkshtNms: get names of all worksheets in MS Excel file y..
NB.* writeLine2WB: open workbook and write row where specified.
NB.* xl2num: attempt to convert Excel value to numeric value.
NB.* addWS: add named worksheet to spreadsheet file.
NB.* writeNewWkbook: write mat to worksheet wsnam in new workbook.
NB.* createNewWkbk: create new workbook named wbname in dir wkbkdir.
NB.* extractXLData: extract all data from Excel spreadsheet.
NB.* fixXLFileNm: ensure we have properly qualified and formatted Excel file name.
NB.* getWSName: get worksheet name given its index (item) number; assume book open.
NB.* setupXLSheet2Load: get spreadsheet data, get only detail record lines,
NB.* getExecutionData: select only full data records from mat of spreadsheet.
NB.* get1Worksheet: get worksheet data; assume workbook already open.
NB.* guessDriveOfFile: guess drive on which named file resides; return
NB.* translateXLRange: translate between Excel range notation and numeric
NB.* translateXLCellNotation: translate between origin-1 row-col coordinates
NB.* quoteIfSp: surround name with '"'s if embedded spaces (stupid MS!).
NB.* macroUsedRange: this is the text of a macro to be inserted into

Code Initialization - Set up namespace, load external modules

We establish a locale "excel" in which all this code resides. We include the base locale in order to have any of those functions available.

coclass 'excel'
coinsert 'base'

NB. load defaultPath 'DSLoad.ijs'
load 'winlib xlutil'
NB. load 'dt'

Code Body

Here is the main body of the code. The global XLDF is commented-out in this version because it requires a date function from another package but the global is not used in any of the following code, so I set it aside in order to make this script stand alone. The global would be useful for any work with dates in Excel.

The global CLIPMAX, determined empirically at one point for a particular machine and OS, is an estimate of the maximum number of floating point numbers we can hold in the clipboard at once. This is necessary because we use the clipboard to transfer data between Excel and the J session so we need to know if we have to do it in pieces for large arrays.

NB.* XLDF: Excel date fudge to adjust Excel day number to Julian.
NB. XLDF=. toJulian 18991230
NB.* CLIPMAX: Approx. max. FP nums clipboard holds
CLIPMAX=: 68000

NB.* runMacro: run macro (1{y) in spreadsheet 0{y.
runMacro=: 3 : 'xlquit '''' [ xlcmd >1{y [ xlopen '''' [ JMACROS=. >0{y'

NB.* getWkshtNms: get names of all worksheets in MS Excel file y .
getWkshtNms=: 3 : 0
   xlopen ''
   flnm=. fixXLFileNm y
   rc=. 0 [ nms=. 'File does not exist: ',flnm,'.'
   if. fexist flnm-.'"' do.                  NB. "fexist" doesn't want '"'s.
       xlcmd 'wb open ',flnm                 NB. Open workbook
       xlid 'ws' [ xlget 'temp worksheets'   NB. Set and name worksheet
       wscount=. ".xlget 'ws count'          NB. Count how many sheets
       nms=. getWSName&.>>:i.wscount
       rc=. 1
   end.
   xlexit ''
   rc;nms-.&.>'"'
NB.EG wsnms=. getWkshtNms 'C:\amisc\SP\LevelsOM\GermanRMBSDefaultModel.xls'
)

NB.* writeLines2WB: open workbook and write rows where specified.
writeLines2WB=: 3 : 0
   'ln2wr whr updfl wsn'=. y            NB. Lines to write, row&col, file name,
   xlid 'wb' [ xlget 'base workbooks' [ xlopen '' NB. worksheet name or num.
   xlcmd 'wb open ',fixXLFileNm updfl
   wbnm=. '"',(xlget 'temp name'),'"'   NB. Workbook name
   xlid 'ws' [ xlget 'temp worksheets'  NB. Set and worksheet name
   if. ' '={.0$wsn do.                  NB. Get number if name given
       wscount=. ".xlget 'ws count'     NB. # of sheets in workbook...
       wsnms=. getWSName&.>>:i.wscount  NB. No error checks here...
       wsnm=. qt,wsn,qt=.'"'#~(-.'"'e.wsn)*.'"'e.;":&.>wsnms
       wsn=. >:wsnms i. <wsnm           NB. Quote if necessary and look up.
   end.
   xlget 'ws item ',":wsn               NB. Sheet by number.
   wsnm=. '"',(xlget 'temp name'),'"'   NB. Set where (Book, sheet,
   towhr=. wbnm,' ',wsnm,' ',":whr      NB.  row, col) to write line.
   ln2wr xlwriter towhr
   xlexit ''                            NB. Exit and save workbook.
NB.EG writeLines2WB_excel_ ('11/15/2006';'IBM';93.11);18 2;'C:\amisc\spsheet.xls';1
)


NB.* xl2num: attempt to convert Excel value to numeric value.
xl2num=: 3 : 0
   if. +./'()' e. y do. y=. '-',y-.'()' end.
   (_&".@:(-.&'+')) y
)

NB. Test the above function.
coinsert_test_ 'base'
coinsert_test_ 'excel'

t_xl2num_test_=: 3 : 0
   assert. _12.3 _123 12000-:xl2num &>'(12.3)';'-123';'12e3'
   assert. 0.012 12000 1 _2 10-:xl2num &>'12E-3';'12e+3';'01';'-02';'1e01'
)

NB.* addWS: add named worksheet to spreadsheet file.
addWS=: 3 : 0
   'flnm wsnm'=. y
   xlopen ''
   xlcmd 'wb open ',flnm
   xlid 'ws' [ xlget 'temp worksheets'
   xlcmd 'ws add'
   xlset 'temp name ',wsnm
   xlexit ''
NB.EG addWS 'C:\temp\test.xls';'newsheet'     NB. Assumes test.xls exists.
)

NB.* writeNewWkbook: write mat to worksheet wsnam in new workbook.
writeNewWkbook=: 4 : 0
   'fldir flnm wsname'=. y
   xlopen ''
   'newnm wbname'=. createNewWkbk fldir;flnm
   xlid 'ws' [ xlget 'temp worksheets'
   xlcmd 'ws add' [ rc=. 1
   if. 0~:#wsname do. xlset 'temp name ',wsname end.
   try. x xlwriter wbname,' ',wsname,' 1 1' catch. rc=. 0 end.
   xlexit ''
)

NB.* createNewWkbk: create new workbook named wbname in dir wkbkdir.
createNewWkbk=: 3 : 0
   'wkbkdir wbname'=. y
   xlget 'base workbooks'                    NB. Assume sheet already open.
   xlid 'wb'
   newitem=. ":>:".xlget 'wb count'
   xlcmd 'wb add'
   xlget 'wb item ',newitem
   orignm=. xlget 'temp name'
   newnm=. wkbkdir,wbname
   if. fexist newnm do. ferase newnm end.    NB. Overwrite any existing file.
   rc=. xlcmd 'temp SaveAs ',newnm
   newnm;wbname
NB.EG 'newnm wbname'=. createNewWkbk STARTLOC;NEWTEAMSFILE
)

NB.* extractXLData: extract all data from Excel spreadsheet.
extractXLData=: 3 : 0
   xlopen ''
   flnm=. quoteIfSp fixXLFileNm y
   if. fexist flnm-.'"' do.                  NB. "fexist" doesn't want '"'s.
       xlcmd 'wb open ',flnm                 NB. Open workbook
       xlid 'ws' [ xlget 'temp worksheets'   NB. Set and name worksheet
       wscount=. ".xlget 'ws count'          NB. Count how many sheets
       dat=. >get1Worksheet&.>>:i. wscount   NB. Get all sheets, ranges, data
       rc=. wdclipwrite ''                   NB. Clear clipboard to avoid message.
   else. dat=. 0;<'File not found: ',flnm,'.'
   end.
   xlquit ''                                 NB. Quit to avoid "Save?" message.
NB.   rc=. xlcmd 'wb close'
NB.   xlexit ''
NB.   wd 'reset'
   dat
NB.EG dat=. extractXLData 'C:\EAFE\200211\nyl1112PassiveExecution.xls'
)

NB.* fixXLFileNm: ensure we have properly qualified and formatted Excel file name.
fixXLFileNm=: 3 : 0
   flnm=. dltb y                       NB. Remove excess spaces.
   flnm=. flnm,>(-.(toupper _4{.flnm)-:suf)#suf=. '.XLS'
   if. -.':'e.flnm do.                  NB. OLE requires disk name: is one
       drv=. guessDriveOfFile flnm=. y NB.  already specified?
       flnm=. drv,flnm                  NB. Drive:\path
   end.
   flnm=. quoteIfSp flnm                NB. Quote name if there are spaces in it.
NB.EG    flnm=. fixXLFileNm y
)

NB.* getWSName: get worksheet name given its index (item) number; assume book open.
getWSName=: 3 : 0
   xlget 'ws item ',":y                NB. y is number of worksheet.
   wsname=. 1|.'""',xlget 'temp name'   NB. Quote so spaces in name don't confuse
)

NB.* setupXLSheet2Load: get spreadsheet data, get only detail record lines,
NB. remove excess characters that will confuse data loading.
setupXLSheet2Load=: 3 : 0
   dat=. extractXLData y                    NB. Assume 1st datasheet is only one
   'shps dat'=. getExecutionData >2{0{dat    NB.  we need to get data from.
   'hdr dat'=. split dat
   dat=. |:dat
   hdr=. toupper&.>hdr-.&.>' '
   numChars=. '0123456789 .-+eE'        NB. Only characters found in numbers.
   xsChars=. <',$'                      NB. Excess characters to remove from
   whx=. >+./&.>xsChars e.&.>dat        NB.  numeric columns.
   wh=. b2i +./|:whx*.>*./&.>dat e.&.><numChars,>xsChars
   dat=. ((wh{dat)-.&.>xsChars) wh}dat
   hdr;<dat
NB.EG setupXLSheet2Load >1{flnms
)

NB.* getExecutionData: select only full data records from mat of spreadsheet.
getExecutionData=: 3 : 0
   shps=. >#&.>y
   fullrow=. (+/"1)0~:shps
   fullest=. >./fullrow
   dat=. (fullrow>:fullest-2)#y        NB. Assume is data (or data title) row
   shps=. (fullrow>:fullest-2)#shps     NB.  if no more than 2 items short.
   shps;<dat
)

NB.* get1Worksheet: get worksheet data; assume workbook already open.
get1Worksheet=: 3 : 0
   bkname=. quoteIfSp xlget 'temp name' [ xlget 'base activeworkbook'
   wsname=. getWSName y                NB. y is number of worksheet.
   bw=. bkname,' ',wsname,' '           NB. initial arguments we pass to jreadr
   range=. xlusedrange bw               NB. Which cells are populated?
   if. -.+./'ABCDEFGHIJKLMNOPQRSTUVWXYZ'e. toupper range do.     NB. No letters means
       range=. ;('$',&.>'A';'IV'),&.>(range='$')<;.1 range       NB.  all columns.
   end.
   if. -.+./'0123456789'e. range do.    NB. If no numerals in range, e.g. $A:$U,
       whc=. range i. ':'               NB.  it means all rows; however, we can't
       range=. (whc {. range),'1',(whc}.range),'10000' NB. read all rows (because
   end.        NB.  of limits of clipboard) so just pray someone will shoot me
NB. if I ever work on a spreadsheet with more than 10000 rows.
   rc=. wdclipwrite ''                  NB. Clear clipboard to avoid message.
   if. ':'e. range do.                  NB. Is complete range?
       nsr=. translateXLRange range     NB. Non-Stupid range notation, i.e. numeric.
       tsz=. */nrc=. >:|-/nsr           NB. Total size of data; num rows, cols.
       rpi=. <.(0{nrc)*<./1,CLIPMAX%tsz NB. Rows per iter'n clipb'd can handle
       sro=. 0                          NB. Starting row offset, column placeholder
       dat=. nrc$a:                     NB. Allocate to final size: faster than appends
       while. sro<0{nrc do.             NB. Get groups of rows at a time.
           rpi=. rpi<.sro-~0{nrc        NB. Don't go past end row
           rc=. (sro,0)+0{nsr           NB. Row, col indexes for next chunk of data
           rc=. ":rc,rpi,(<1 1){nsr     NB. Next batch of rows for all cols
           tmp=. xlreadr bw,rc          NB. Read rows
           if. rpi~:#tmp do.            NB. Don't know why this happens but
               'corx rpi xv'=. fixSizeMismatch rpi;#tmp     NB. it does.
               if. corx do.             NB. Expand
                   xv=. (sro$1),xv,1$~((#dat)++/0=xv)-sro+#xv
                   dat=. xv #^:_1 dat
               else.                    NB. Compress
                   xv=. (sro$1),xv,1$~(#dat)-sro+#xv
                   dat=. xv#dat
               end.
           end.
           tmp=. ((#tmp),1{$dat){.tmp   NB. Ensure widths match
           dat=. tmp (sro+i.rpi)}dat
           sro=. sro+rpi                NB. Advance to next set of rows.
       end.
       dat=. dat#~+./\&.|.0 +./ . ~:|:>#&.>dat    NB. Trim trailing empty rows.
   else.                                NB. Incomplete range means no data.
       dat=. ''
   end.
   wsname;range;<dat
)

NB.* fixSizeMismatch: if 2 sizes differ, give expansion or compression vec
NB. to fit 1st to 2nd.
fixSizeMismatch=: 3 : 0
   'rpi nrt'=. y             NB. rows/iteration & number of rows in tmp
   corx=. xv=. 1              NB. Expansion vec if nothing to do.
   if. rpi~:nrt do.
       if. rpi<nrt do.        NB. More rows of tmp than rpi...
           xv=. nrt{.rpi$1
       else.                  NB. Fewer rows of tmp than rpi...
           xv=. rpi{.nrt$1    NB. Actually is compression vec, not expansion
           corx=. 0           NB. Signal compression, not expansion
       end.
       rpi=. nrt
   end.
   corx;rpi;xv
NB.EG 'corx rpi xv'=. fixSizeMismatch rpi;#tmp
)

NB.* guessDriveOfFile: guess drive on which named file resides; return
NB. first valid match.
guessDriveOfFile=: 3 : 0
   'EC' guessDriveOfFile y        NB. E: and C: are my 2 most likely drives.
:
   fl=. y
   drives=. x ,&.>/ ':'
   if. ':' e. fl do.               NB. If a drive was specified, look at
       wh=. >:fl i. ':'            NB. it first but don't assume we'll
       drives=. ~.(<wh{.fl),drives NB. find file there.
       fl=. wh}.fl
   else.                                     NB. If no drive specified, look
       drives=. ~.(<currDrive ''),drives     NB. on current drive first.
   end.
   good=. ''
NB.   good=. ;fexist&.> drives ,&.>/ boxopen fl
NB.   good#drives
NB. Above 2 lines find all occurrences but probably only want 1st - assuming
NB. we search the drives in a rational order.
   for_i. i. #drives do.
       if. fexist (>i{drives),fl do.
           good=. >i{drives
       break. end.
   end.
   good
)

NB.* translateXLRange: translate between Excel range notation and numeric
NB. equivalent, e.g. '$A$1:$J$10' -> 1 1 10 10, or 27 2 43 26 ->
translateXLRange=: 3 : 0
   if. ' '={.0$y do.         NB. Excel -> numeric
      >translateXLCellNotation&.><;._1 ':',y
   else.                      NB. numeric (row,col,row,col) -> Excel
      xlnot=. translateXLCellNotation&.>(<"1) 2 2$,y
      (>0{xlnot),':',>1{xlnot
   end.
NB.EG    (2 2$9 27 99 54) -: translateXLRange '$AA$9:$BB$99'
NB.EG    '$B$1:$Z$10' -: translateXLRange 1 2 10 26
)

NB.* translateXLCellNotation: translate between origin-1 row-col coordinates
NB. and MS Excel spreadsheet cell designation, e.g. $A$1->1 1, or
NB. $AA$3 -> 3 27; or 1 256 -> '$IV$1'.  Does NOT account for Excel
NB. limit of 256 columns and 65536 rows: will translate an unuseable
NB. address like '$XL$99999' but fails after column 728 ('$ZZ')
NB. because limited to 2 alpha digits.
translateXLCellNotation=: 3 : 0
   alph=. ' ABCDEFGHIJKLMNOPQRSTUVWXYZ' [ nums=. '0123456789'
   adrs=. y                  NB. NO validity checking on address.
   if. ' '={.0$adrs do.       NB. Excel -> numeric
       adrs=. toupper adrs-.' '         NB. Only want ' ' if leading
       whlets=. adrs e. alph            NB.  intro'd by _2{. below.
       whnums=. adrs e. nums
       col=. ((<:#alph),1)+/ . * _2{.alph i. whlets#adrs
       (". whnums#adrs), col  NB. Row, column order is more natural for us.
   else.                      NB. numeric (row, col) -> Excel
       col=. 0 1+(0,<:#alph)#:<:(1<.<:#,adrs){adrs
       col=. ' '-.~'$',col{alph              NB. Excel stores in column-major
       if. 1~:#adrs do. col, '$',":0{adrs    NB.  order.
       else. col end.                        NB. Assume single # just column.
   end.
NB.EG    1 27 -: translateXLCellNotation '$AA$1'
NB.EG    '$IV$1' -: translateXLCellNotation 1 256
NB. This function is own inverse, e.g.
NB.EG    cells=. '$A$1';'$Z$1';'$AA$2';'$AZ$2';<'$BA$3'
NB.EG    cells -: translateXLCellNotation&.>translateXLCellNotation&.>cells
)

NB.* quoteIfSp: surround name with '"'s if embedded spaces (stupid MS!).
quoteIfSp=: 3 : 0
   ifsp=. '"'#~' ' e. y
   flnm=. ifsp,(y-.'"'),ifsp
)

NB. This should be in xlutil.ijs.
NB. xlusedrange  'book sheet' DM 2002 11 13
xlusedrange=: 3 : 0
3 xljmacro 'jusedrange ',y
)

NB.* macroUsedRange: this is the text of a macro to be inserted into
NB. jmacro.xls.
macroUsedRange=: 0 : 0
' jusedrange book sheet DM 2002 11 13
Function jusedrange(b, s) As String
jusedrange = Workbooks(b).Worksheets(s).UsedRange.Address
End Function
)

coclass 'base'