User:Andrew Nikitin/Jtangle

From J Wiki
Jump to navigation Jump to search

This page presents 2 ways to extract runnable source from 'literate' MoinMoin pages.

First is implemented in J and is intended to be built into J system to provide seamless integration of literate programs.

Second is implemented in Perl and extracts parts from file using command line interface.

J script

[{{#file: "jtangle.ijs"}} Download script: jtangle.ijs ]


Recursive part unwrapping algorithm uses global variables to keep raw data and track which section are being processed. This is why all verbs reside in their own jtangle locale. j prefix seems appropriate since this is intended to be part of development environment. [{{#file: "j"}} Download script: j ]

cocurrent 'jtangle'
require 'regex'
NB. readlit_z_=:readlit_jtangle_


extract_pieces verb takes entire file contents as string and cuts it into named pieces according to 'name' attribute of pieces.

It assigns list of boxed section names in SECTIONS global. The corresponding boxed text is in TEXT global. It assumes input to be utf8-encoded text with each line terminated with LF character (that includes CRLF). [{{#file: "extract_pieces"}} Download script: extract_pieces ]

extract_pieces=:3 : 0
  assert. LF-:{:y

  hdri=.reB rxmatches y
  sn=.y rxfrom~ {:"2 hdri
  strt=.>:@+/@{."2 hdri
  i=./: l=.strt , {.@{."2 reE rxmatches y
  TEXT=:sn <@;/. y rxfrom~ _2 ({. , -~/)\ l{~i#~ (+. _1&|.) i<#strt
  i.0 0

cleanup deletes all used temporary globals. It is not used right now (mostly for debugging purposes) but sholdbe included in readlit [{{#file: "extract_pieces"}} Download script: extract_pieces ]

cleanup=:3 : 0
  i.0 0


[{{#file: "unwrap"}} Download script: unwrap ]

unwrap=: 3 : 0

unwrap returns contents of a section named y. If this section contains references to ther sections, they are substituted recursively. It uses globals generated by extract_pieces. [{{#file: "unwrap"}} Download script: unwrap ]

  y=.boxopen y
  if. y e. STACK do. return. end. NB. prevent circular referencing
  assert. (#SECTIONS)>k=.SECTIONS i. y

Only one section reference per line is allowed. Nothing but whitespaces can be in that line. TODO: (?) Whitespaces before section reference should be inserted before each line of section text. [{{#file: "unwrap"}} Download script: unwrap ]

  r=.'^(\s*)«(.*)»\s*$' rxmatches result
  n=.(2{"2 r) rxfrom result
  rplcdata=.unwrap &.>n
  result=.rplcdata  (0&{"2 r) rxmerge result

We used to update section text with unwrapped data, but then decided against it.

  TEXT=:(<result) k} TEXT

[{{#file: "unwrap"}} Download script: unwrap ]



[{{#file: "read"}} Download script: read ]

readlit=:3 : 0
  extract_pieces 1!:1 < jpath > y
  i=.1 i.~ (1&,^:([: -. +./)) ([: +./ '.ijs'&E.)&> SECTIONS
  unwrap i{SECTIONS
  extract_pieces 1!:1 < jpath > y
  unwrap x

Optional left argument specifies which section to output. If not specified, then first section name that contains .ijs is used. If there is no such section, then first section in the file.

test commands

T=:1!:1 <jpath '~nsg\literate\jtangle.lit'
readlit '~nsg\literate\jtangle.lit'

Perl script download

For convenirnce the generated perl code is provided as an attachment. It is not necessarily the latest one (which also may be a good thing).

Perl script

Note on comments

[{{#file: ""}} Download script: ]


Current literate parser attaches comment (which points to the literate source from which script was generated) on top of file. Usually it correctly guesses the form of comment (J-style NB. comments or Perl # comments), but may make a mistake once in a while. Please check downloaded source for consistency if you encounter problems. [{{#file: "perl"}} Download script: perl ]

# 2007-03-19
use strict;
use bytes;

Perl script options

[{{#file: "perloptions"}} Download script: perloptions ]

use Getopt::Std;

[{{#file: "perloptions"}} Download script: perloptions ]

  $opt_f, # use section names as filenames, otherwise dump everything to STDOUT;

Default behaviour is to ignore filenames and dump everything to STDOUT. The problem is that filename may contain relative paths and attempt to overwrite system files via

{ { {#!literate name='c:\autoexec.bat'
... something sinister

This way the harm can be done during source extraction stage wich is less expected. Need to implement some kind of checking mechanism. [{{#file: "perloptions"}} Download script: perloptions ]

  $opt_s, # extract only section with given name

If this option is not specified, then script will extract sections that have '.' in their names, assuming those are source files. If only one file or only specific section is needed, then the name of this section may be specified. [{{#file: "perloptions"}} Download script: perloptions ]

  $opt_l, # list section names and their relationships

[{{#file: "perloptions"}} Download script: perloptions ]

  $opt_q, # quiet mode, do not show any warnings

[{{#file: "perloptions"}} Download script: perloptions ]

die "-f and -l are mutually exclusive" if $opt_f && $opt_l;

Instead of die we could have quietly turn -f off when -l is on, but it seems better to not try to guess user intentions.

Scan entire file

[{{#file: "perlvar"}} Download script: perlvar ]

our %piece;
our $section;

For each line in a file grab lines into global hash %piece, which contains named sections in form of arrays of strings. Current section name is in global $section [{{#file: "perl"}} Download script: perl ]

my $CLOSE='}' x 3; # kludge to work around current literate parsing
while(<>) {
  my $n;
  if( $n=/^\s*
{{{#!literate.*name='([^']*)'/ .. /^\s*$CLOSE\s*$/ ) {
    if( 1==$n ) {
      $piece{$section}=[] unless exists $piece{$section};

Perl's .. (range operator) returns 'E0' attached to the position number when line matches final expression. This does not change position's numeric value but gives something to look for to test for final expression. [{{#file: "perl"}} Download script: perl ]

    elsif( 'E0' ne substr($n,-2,2) ) {
      push @{$piece{$section}},$_;

Select and unwrap top-level section

Scan through named sections and recursively unwrap those that contain '.' in their name. [{{#file: "perlvar"}} Download script: perlvar ]

our $PREFIX;

Global $PREFIX contains string to prepend to indented sections (for now can only be whitespaces. TODO(?) comments). [{{#file: "perlvar"}} Download script: perlvar ]

our @STACK;

Global @STACK contains list of pending sections to detect self references. [{{#file: "perl"}} Download script: perl ]

close STDOUT if $opt_f;
for my$s(keys %piece) {
  if( $s eq $opt_s || ('' eq $opt_s && 0<=index($s,'.'))  ) {
    if( $opt_f ) {
      warn "Write section to $s\n" unless $opt_q;
      open STDOUT, ">$s" if $opt_f;
    close STDOUT if $opt_f;

Procedure that recursively unwraps sections [{{#file: "perl"}} Download script: perl ]

sub unwrap($)
  my $s=shift;
  if( !exists $piece{$s} ) {
    warn "Section $s is referenced but not defined. Nothing is substituted.\n" unless $opt_q;

If name of a current section is already in @STACK then substitution will never finish. Give warning and ignore this occurence of section. [{{#file: "perl"}} Download script: perl ]

  for my$e(@STACK) {
    if( $s eq $e ) {
      warn "Recursion detected: $s" unless $opt_q;

For each line of section either output it (with prepended $PREFIX) or, if it is a section reference, recursively unwrap it. For now there can be only one section reference per line and nothing but whitespace is allowed around it. [{{#file: "perlvar"}} Download script: perlvar ]

our %unwrapped;

The hash %unwrapped keeps track of which sections were used and how many times. Currently it is possible to use section more than once. Maybe, this needs to be signalled as a mistake. [{{#file: "perl"}} Download script: perl ]

  if( $opt_l ) {
    print "",("  " x @STACK),("@" x (1<$unwrapped{$s})),$s,"\n";
    return if $unwrapped{$s}>1;
  push @STACK,$s;
  for my$l(@{$piece{$s}}) {
    if( $l=~/^(\s*)«(.*)»\s*$/ ) {
      my $p=$PREFIX;
    } else {
      print "",$PREFIX,$l unless $opt_l;
  pop @STACK;

Warn about unused sections

In the end check if any of the named sections were not used by unwrap and give warning. [{{#file: "perl"}} Download script: perl ]

for my$s(keys %piece) {
  if( $opt_l ) {
    print "-$s\n" if !exists $unwrapped{$s};
  } else {
    if( !exists $unwrapped{$s} ) {
      warn "Section $s is defined but never used\n" unless $opt_q;
    } elsif( 1<$unwrapped{$s} ) {
      warn "Section $s is used more than once\n" unless $opt_q;

Obtaining literate source

For some reason

wget -U "Mozilla" -O- | perl -s jtangle.ijs >z.ijs

garbles end of lines. Other than that this command is a reasonably valid way to obtain latest source.

Alternatively, raw source may be downloaded into the local file and converted separately

wget -U "Mozilla" -Oz
perl -s jtangle.ijs z >z.ijs

I keep copies of my own LPs on local hard drive. When I want to make change I download raw source from Wiki (using variant of above command), do compare, incorporate changes and save LP back to Wiki. This way Wiki acts as a kind of version control system.

jadeful hack

Performing a manual step of extracting code portion from literate source before execution may be just enough bother to discourage its use altogether.

Suggested hack replaces the standard load utility in system\extras\util\jadefull.ijs and/or system\extras\util\jadecon.ijs to perform this extraction step automatically if needed.

The hack recognizes files with .lit extension as needing special treatment. It extracts first .ijs section (or just first section) from it and runs it from noun using 0!:100 or 0!:101 foreigns. Note that script and scriptd are more than just 0!:0 and 0!:1, but this simplistic approach should work for now. [{{#file: "jadeful hack"}} Download script: jadeful hack ]

load_z_=: 3 : 0
0 load y
fls=. getscripts_j_ y
fn=. ('script',x#'d')~
for_fl. fls do.
  if. DISPLAYLOAD_j_ do. smoutput > fl end.
  if. '.lit' -: _4 {. > fl do.
    NB. special treatment for .lit files
    NB. modify location of jtangle.ijs if different
    require '~nsg/literate/jtangle.ijs'
    0!:(100+x) readlit_jtangle_ fl
    fn fl
  LOADED_j_=: ~. LOADED_j_,fl

Contributed by Andrew Nikitin


(Notwithstanding that it's work in progress.) Literate/Wiki Tool is implemented as a MoinMoin plugin. It has, naturally, its own "tangle". Which is also naturally implemented in Python. It's not a question about whose choice of langauge of implementation is better, but of practical nature: Wouldn't having another an alternative Perl implementation be duplicating the effort? The Literate Wiki Tool will be evolving and it only make sense to have the same code base for tanlge, that will used in both places: stand-alone script and Wiki plugin. -- Oleg Kobchenko <<DateTime(2007-03-20T19:01:55Z)>>

I need perl script for my internal process anyway. Besides, I do not have python installed on any of my machines and will not have in forseeable future and "duplicating effort" on one page script does not seem like such a waste to me. BTW, if you post your python parser, preferably in literate form, I will try to ensure that perl and j implementations match it as close as possible. -- Andrew Nikitin <<DateTime(2007-03-20T19:10:13Z)>>

It is published, where it should: in parser market of MoinMoin. Making it Literate is a good idea. I don't know how complicated parser installation process is at MoinMoin web site, but having some experience and a few rounds of improvements here at J Wiki, will help them get convinced. -- Oleg Kobchenko <<DateTime(2007-03-20T19:47:40Z)>>

CategoryLiterate CategoryWorkInProgress